0
votes

I am new in Spark, and I am trying to execute the NaiveBayes from this example: https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark/examples/ml/JavaLogisticRegressionSummaryExample.java

I know its a stupid question, but I search for long times and I still cannot solve it, I am using NetBeans. The following import has errors, asking me to search dependency

import org.apache.spark.ml.classification.BinaryLogisticRegressionTrainingSummary;
        import org.apache.spark.ml.classification.LogisticRegression;
        import org.apache.spark.ml.classification.LogisticRegressionModel;
        import org.apache.spark.sql.Dataset;
        import org.apache.spark.sql.Row;
        import org.apache.spark.sql.SparkSession;
        import org.apache.spark.sql.functions;

I only can find this dependency from https://mvnrepository.com/artifact/org.apache.spark/spark-mllib_2.10/1.0.0, but the errors still remain.

Can anyone tell me where can I find these maven dependencies? Thanks!!

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-mllib_2.10</artifactId>
    <version>1.0.0</version>
    <scope>runtime</scope>
</dependency>

My pom:

?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <groupId>com.mycompany</groupId>
    <artifactId>BigDataLab02</artifactId>
    <version>1.0-SNAPSHOT</version>
    <packaging>jar</packaging>
    <dependencies>
        <dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-common</artifactId>
    <version>2.2.0</version>
</dependency>
<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-mapreduce-client-core</artifactId>
    <version>2.2.0</version>
</dependency>
<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-core</artifactId>
    <version>0.20.2</version>
</dependency>
<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-hdfs</artifactId>
    <version>2.2.0</version>
</dependency>
        </dependencies>
         <build>
    <plugins>
        <plugin>
          <groupId>org.apache.maven.plugins</groupId>
          <artifactId>maven-jar-plugin</artifactId>
          <version>2.4</version>
          <configuration>
              <archive>
                  <manifest>
                      <addClasspath>true</addClasspath>
                      <mainClass>com.mycompany.bigdatalab02.Demo</mainClass>
                  </manifest>
              </archive>
          </configuration>
      </plugin>
  </plugins>
</build>
    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <maven.compiler.source>1.8</maven.compiler.source>
        <maven.compiler.target>1.8</maven.compiler.target>
    </properties>
</project>

enter image description here

1
can you add your other spark dependencies too ?koiralo
Please share more - error message/stacktrace. Also full pom.xml would not hurt.Vladislav Varslavans
@ShankarKoirala Yes, but my problem is I don't know what dependencies should I add...Shin Yu Wu
@VladislavVarslavans I add my pom.xml on top, this is I copy from my lab review session, and add one dependency that I search. The error message says Search Dependency at Marven Repository for org.xxxxxx, but when I click, there is no dependency I can addShin Yu Wu
But what is exactly that org.xxxxxx? Please share full error message or screenshotVladislav Varslavans

1 Answers

0
votes

You need to add:

<properties>
  <spark.version>2.1.1</spark.version>
  <scala.version>2.11.8</scala.version>
  <scala.compat.version>2.11</scala.compat.version>
</properties>

to the root of you pom.xml file.

Then add:

<dependency>
    <groupId>org.scala-lang</groupId>
    <artifactId>scala-library</artifactId>
    <version>${scala.version}</version>
    <scope>provided</scope>
</dependency>
<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-core_${scala.compat.version}</artifactId>
    <version>${spark.version}</version>
    <scope>provided</scope>
</dependency>
<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-sql_${scala.compat.version}</artifactId>
    <version>${spark.version}</version>
    <scope>provided</scope>
</dependency>
<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-mllib_${scala.compat.version}</artifactId>
    <version>${spark.version}</version>
    <scope>provided</scope>
</dependency>

to the dependencies.