Quick install of Spark and Scala on Windows 7 in Standalone mode

November 5, 2015/0/0
Home / Quick install of Spark and Scala on Windows 7 in Standalone mode / Quick install of Spark and Scala on Windows 7 in Standalone mode

I wanted an easy of way of developing scala and spark code on my local Windows machine. These steps got it working.

1. Download and install java: http://java.com/en/download/

2. Install Scala:

  • Download the latest bianry from http://www.scala-lang.org/download/
  • Locate and copy the installation directory (by default something like ‘C:\Program Files (x86)\scala’)
  • Create a scala environment variable:
    • Open the start menu, type ‘envi’ and select ‘Edit environment variables for your account’
    • In the box that comes up, there are two lists, and two sets of buttons. Click the top ‘New’ button and enter:
      • Click the top ‘New’ button (under ‘User variables for <your name>’)
      • Variable Name: SCALA_HOME
      • Variable Value: <the installation directory you copied above>
      • Click OK
    • Now back on the environment variables main box
      • Double click the ‘PATH’ variable in the top box.
      • In the Variable Value box, scroll to the end, and add “;%SCALA_HOME%\bin” (without the quotes)
      • Hit OK
    • Click OK to close the environment variables box.
    • Test it has worked by opening a Command Prompt window, and entering the command ‘scala’. If you don’t get an error, and a scala prompt comes up, you did it right. If not, go and check you did everything right.

3. Install Spark

  • Go to http://spark.apache.org/downloads.html
  • In the form
    • Keep the default (latest) spark release
    • For package type, select the top one that says ‘Pre-built for Hadoop 2.X and later’
    • Download the tar
  • It comes as a tgz file, so you will need to install 7-zip or some other tool to decompress and unarchive it.
    • If using 7-zip, open the download tgz.
    • Make sure you double click into the file, until you see a normal folder, named something like ‘spark-1.5.1-bin-hadoop2.6′
    • At that point, select the folder and extract it somewhere, like C:/Spark
  • Update the environment variables:
    • Navigate to where you extracted the folder, and copy the location path (something like C:\Spark\spark-1.5.1-bin-hadoop2.6)
    • Open the environment variables as above for scala, and add a new variable:
      • Variable Name: SPARK_HOME
      • Variable Value: <the location path you copied above>
    • Now add “;%SPARK_HOME%\bin” to the end of the PATH variable, as before.

4. Install winutils

Spark expects winutils.exe (part of Hadoop) to exist with this setup, otherwise we receive an error.

  • Download the executable from here: http://public-repo-1.hortonworks.com/hdp-win-alpha/winutils.exe
  • Create another new folder, e.g. C:\Spark\winutils. Copy this location path.
  • Create another folder inside that one called ‘bin’. e.g. C:\Spark\Winutils\bin\
  • Copy winutils.exe into the bin folder.
  • Go to your environment variables, and create a new variable:
    • Variable Name: HADOOP_HOME
    • Variable Value: <the location path you copied above>
  • There is no need to update the PATH variable this time!

5. Test the installation

  • Open a Command Promt, and enter ‘spark-shell’. Spark should start up, and after a few seconds, you should be presented with a scala promt!
  • While this prompt is open, visit http://localhost:4040/ in your browser, and you should see the Spark UI
  • Starting from a command prompt, you can also run an example to check the setup is working:
    • Enter “run-example SparkPi 10″
    • Spark will run, with lots of logging output.
    • When it has finished, if you review the logs, about 12 log entries up, you should see a line that says (something like):
      • Pi is roughly 3.140840
    • It doesn’t matter if your answer is different, just that you have an answer.
    • Congratulations, your installation is working. Time to get developing!
Navigation

Add comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.