0
votes

I'm very new with big data and spark and here is how I'm trying to get a spark session

SparkConf conf = new SparkConf().setMaster("local").setAppName("SaavnAnalyticsProject");
sparkSession = SparkSession.builder().config(conf).getOrCreate();

This is the error I'm getting

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 20/02/03 02:29:40 INFO SparkContext: Running Spark version 2.3.0 Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Joiner.on(C)Lcom/google/common/base/Joiner; at org.apache.hadoop.metrics2.lib.UniqueNames.(UniqueNames.java:44) at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.(DefaultMetricsSystem.java:41) at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.(DefaultMetricsSystem.java:36) at org.apache.hadoop.security.UserGroupInformation$UgiMetrics.create(UserGroupInformation.java:120) at org.apache.hadoop.security.UserGroupInformation.(UserGroupInformation.java:236) at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2464) at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2464) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2464) at org.apache.spark.SparkContext.(SparkContext.scala:292) at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2486) at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:930) at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:921) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:921) at saavnAnalytics.SaavnAnalyticsMain.main(SaavnAnalyticsMain.java:55)

Here is my pom.xml

 <properties>
 <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <maven.compiler.target>1.8</maven.compiler.target>
        <maven.compiler.source>1.8</maven.compiler.source>
    </properties>
  <dependencies>
    <dependency>
        <!-- Apache Spark main library -->  
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.11</artifactId>
        <version>2.3.0</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql_2.11</artifactId>
        <version>2.3.0</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-streaming_2.11</artifactId>
        <version>2.3.0</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-mllib_2.11</artifactId>
        <version>2.3.0</version>
    </dependency>
        <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>3.8.1</version>
      <scope>test</scope>
    </dependency>

    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-aws</artifactId>
        <version>2.7.1</version>
    </dependency>
    <dependency>
        <groupId>com.fasterxml.jackson.core</groupId>
        <artifactId>jackson-annotations</artifactId>
        <version>2.6.0</version>
    </dependency>
    <dependency>
        <groupId>com.fasterxml.jackson.core</groupId>
        <artifactId>jackson-databind</artifactId>
        <version>2.6.0</version>
    </dependency>
    <dependency>
      <groupId>com.amazonaws</groupId>
      <artifactId>aws-java-sdk</artifactId>
      <version>1.7.4</version>
    </dependency>
    </dependencies>
    <build>
   <plugins>
        <!-- Maven Shade Plugin -->
        <plugin>
          <groupId>org.apache.maven.plugins</groupId>
          <artifactId>maven-shade-plugin</artifactId>
          <version>2.3</version>
          <executions>
             <!-- Run shade goal on package phase -->
            <execution>
            <phase>package</phase>
            <goals>
                <goal>shade</goal>
            </goals>
            <configuration>
             <transformers>
                <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                    <mainClass>SaavnAnalytics.SaavnAnalyticsMain</mainClass>
            </transformer>
              </transformers>
            <filters>             
            <filter>               
            <artifact>*:*</artifact>               
            <excludes>                 
            <exclude>META-INF/*.SF</exclude>                 
            <exclude>META-INF/*.DSA</exclude>                 
            <exclude>META-INF/*.RSA</exclude>               
            </excludes>             
            </filter>           
            </filters>         
            </configuration>
            </execution>
          </executions>
        </plugin>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <configuration>
                    <source>1.8</source>
                    <target>1.8</target>
                </configuration>
            </plugin>
        </plugins>

I've added google-collect-0.5.jar and com.google.collections.jar explicitly.

Any idea where I'm going wrong?

1
it's often problem in spark apps containing external dependencies witch depends on guava. Spark conflicts with them. Best decision is using shading, for example like here: stackoverflow.com/a/59747646/6770614Boris Azanov
@Boris Thank you for your suggestion but I'm already using shades. If you scroll down through my pom.xml you will find the dependencies. So back to square one any idea what could be the reason?Zeus07

1 Answers

1
votes

try to modify your pom.xml plugins section:

<plugins>
    <!-- Maven Shade Plugin -->
    <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-shade-plugin</artifactId>
        <version>2.3</version>
        <executions>
            <!-- Run shade goal on package phase -->
            <execution>
                <phase>package</phase>
                <goals>
                    <goal>shade</goal>
                </goals>
                <configuration>
                    <relocations>
                        <relocation>
                            <pattern>com.google.common</pattern>
                            <shadedPattern>shade.com.google.common</shadedPattern>
                        </relocation>
                    </relocations>
                    <transformers>
                        <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                            <mainClass>SaavnAnalytics.SaavnAnalyticsMain</mainClass>
                        </transformer>
                    </transformers>
                    <filters>
                        <filter>
                            <artifact>*:*</artifact>
                            <excludes>
                                <exclude>META-INF/*.SF</exclude>
                                <exclude>META-INF/*.DSA</exclude>
                                <exclude>META-INF/*.RSA</exclude>
                            </excludes>
                        </filter>
                    </filters>
                </configuration>
            </execution>
        </executions>
    </plugin>
    <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-compiler-plugin</artifactId>
        <configuration>
            <source>1.8</source>
            <target>1.8</target>
        </configuration>
    </plugin>
</plugins>