Machine Learning with Scala Quick Start Guide

上QQ阅读APP看书，第一时间看更新

Configuring the programming environment

I am assuming that Java is already installed on your machine and JAVA_HOME is set too. Also, I'm assuming that your IDE has the Maven plugin installed. If so, then just create a Maven project and add the project properties as follows:

<properties>
     <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
     <java.version>1.8</java.version>
     <jdk.version>1.8</jdk.version>
     <spark.version>2.3.0</spark.version>
 </properties>

In the preceding properties tag, I specified the Spark version (that is, 2.3.0), but you can adjust it. Then add the following dependencies in the pom.xml file:

<dependencies>
     <dependency>
         <groupId>org.apache.spark</groupId>
         <artifactId>spark-core_2.11</artifactId>
         <version>${spark.version}</version>
     </dependency>
     <dependency>
         <groupId>org.apache.spark</groupId>
         <artifactId>spark-sql_2.11</artifactId>
         <version>${spark.version}</version>
         </dependency>
     <dependency>
         <groupId>org.apache.spark</groupId>
         <artifactId>spark-mllib_2.11</artifactId>
         <version>${spark.version}</version>
         </dependency>
     <dependency>
         <groupId>org.apache.spark</groupId>
         <artifactId>spark-graphx_2.11</artifactId>
         <version>${spark.version}</version>
     </dependency>
     <dependency>
         <groupId>org.apache.spark</groupId>
         <artifactId>spark-yarn_2.11</artifactId>
         <version>${spark.version}</version>
         </dependency>
     <dependency>
         <groupId>org.apache.spark</groupId>
         <artifactId>spark-network-shuffle_2.11</artifactId>
         <version>${spark.version}</version>
         </dependency>
    <dependency>
         <groupId>org.apache.spark</groupId>
         <artifactId>spark-streaming-flume_2.11</artifactId>
         <version>${spark.version}</version>
     </dependency>
     <dependency>
         <groupId>com.databricks</groupId>
         <artifactId>spark-csv_2.11</artifactId>
         <version>1.3.0</version>
         </dependency>
 </dependencies>

Then, if everything goes smoothly, all the JAR files will be downloaded in the project home as Maven dependencies. Alright! Then we can start writing the code.