0
votes

I am trying to create Wordcount Prject using Spark & Java on Eclipse in Cloudera through VMware.The Java Version is 1.7 and Spark version is 2.0.0. The code inside "JavaWordCount.java" class in the project is as follows:

    package com.vishal.wc;

    import scala.Tuple2;


import org.apache.hadoop.hive.ql.exec.spark.session.SparkSession; 
import org.apache.spark.api.java.JavaRDD;

    public class JavaWordCount {

        public static final Pattern SPACE = Pattern.compile(" "); 

public static void main(String[] args) throws Exception {

                if(args.length < 2){        
System.err.println("Usage: JavaWordCount <InputFile> <OutputFile>");        System.exit(1);     }       
SparkSession spark= SparkSession.builder().appName("JavaWordCount").getOrCreate();      JavaRDD<String> lines = spark.read().textFile(args[0]).javaRDD();       JavaRDD<String> words = lines.flatMap(new FlatMapFunction<String, String>(){

public Iterator<String> call(String s){             
return Arrays.asList(s.split(" ")).iterator();      
}   
});         
JavaPairRDD<String, Integer> ones = words.mapToPair(new PairFunction<String, String, Integer>(){

public tuple2<String, Integer> call(String s){

return new tuple2<>(s,1);           
}
});

JavaPairRDD<String, Integer> counts = ones.reduceByKey(
new Function2<Integer, Integer, Integer>(){

public Integer call(Integer i1, Integer i2){
                        return i1 = i2;
                    }           
});
counts.saveAsTextFile(args[1]);         
spark.stop();       
}
}

There were errors as there no Spark jars added. I added jars from Spark-2.0.0-bin-hadoop-2.7.tgz into the build path but still the errors are almost same. Errors are given below:

Description Resource    Path    Location    Type
FlatMapFunction cannot be resolved to a type    JavaWordCount.java  /SparkProject/src/com/vishal/wc line 26 Java Problem
Function2 cannot be resolved to a type  JavaWordCount.java  /SparkProject/src/com/vishal/wc line 44 Java Problem
Iterator cannot be resolved to a type   JavaWordCount.java  /SparkProject/src/com/vishal/wc line 28 Java Problem
JavaPairRDD cannot be resolved to a type    JavaWordCount.java  /SparkProject/src/com/vishal/wc line 32 Java Problem
JavaPairRDD cannot be resolved to a type    JavaWordCount.java  /SparkProject/src/com/vishal/wc line 42 Java Problem
PairFunction cannot be resolved to a type   JavaWordCount.java  /SparkProject/src/com/vishal/wc line 32 Java Problem
The method builder() is undefined for the type SparkSession JavaWordCount.java  /SparkProject/src/com/vishal/wc line 22 Java Problem
The method flatMap(FlatMapFunction<String,U>) in the type AbstractJavaRDDLike<String,JavaRDD<String>> is not applicable for the arguments (new FlatMapFunction<String,String>(){})  JavaWordCount.java  /SparkProject/src/com/vishal/wc line 26 Java Problem
The method mapToPair(PairFunction<String,K2,V2>) in the type AbstractJavaRDDLike<String,JavaRDD<String>> is not applicable for the arguments (new PairFunction<String,String,Integer>(){})  JavaWordCount.java  /SparkProject/src/com/vishal/wc line 32 Java Problem
The method read() is undefined for the type SparkSession    JavaWordCount.java  /SparkProject/src/com/vishal/wc line 24 Java Problem
The method stop() is undefined for the type SparkSession    JavaWordCount.java  /SparkProject/src/com/vishal/wc line 52 Java Problem
tuple2 cannot be resolved to a type JavaWordCount.java  /SparkProject/src/com/vishal/wc line 35 Java Problem
tuple2 cannot be resolved to a type JavaWordCount.java  /SparkProject/src/com/vishal/wc line 37 Java Problem

Please help.

1
Do you have all right import statements on the top? because FlatMapFunction is a member of org.apache.spark.api.java.function package which you haven't added. Simillar issues may be there for other unresolved types. - 01000001
try restarting eclipse - Vikrame

1 Answers

0
votes

You need to import the missing libraries like below

import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.function.FlatMapFunction;
import org.apache.spark.api.java.function.Function2;
import org.apache.spark.api.java.function.PairFunction;

Eclipse provides shortcut Ctrl+Shft+O for getting all missing imports.