0
votes

Pasted code from Sandy Ryza's repository to ensure I wasn't mistyping (below) and am getting a runtime error where the job stops and produces the error below.

Note that if I have any other simple rdds, all these operations work fine - seems to only have the problem with the medline data.

https://github.com/sryza/aas/blob/master/ch07-graph%2Fsrc%2Fmain%2Fscala%2Fcom%2Fcloudera%2Fdatascience%2Fgraph%2FRunGraph.scala

It specifically starts to throw the exceptions below when it hits the val topicCounts line below: It throws similar error if i try to do the examples from the book (not in his code with lelem.label, elem.attributes).

val topics: RDD[String] = medline.flatMap(mesh => mesh) val topicCounts = topics.countByValue()

/Library/Java/JavaVirtualMachines/jdk1.8.0_25.jdk/Contents/Home/bin/java -agentlib:jdwp=transport=dt_socket,address=127.0.0.1:55166,suspend=y,server=n -Dfile.encoding=UTF-8 -classpath "/Library/Java/JavaVirtualMachines/jdk1.8.0_25.jdk/Contents/Home/lib/ant-javafx.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_25.jdk/Contents/Home/lib/dt.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_25.jdk/Contents/Home/lib/javafx-mx.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_25.jdk/Contents/Home/lib/jconsole.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_25.jdk/Contents/Home/lib/sa-jdi.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_25.jdk/Contents/Home/lib/tools.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_25.jdk/Contents/Home/jre/lib/charsets.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_25.jdk/Contents/Home/jre/lib/deploy.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_25.jdk/Contents/Home/jre/lib/javaws.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_25.jdk/Contents/Home/jre/lib/jce.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_25.jdk/Contents/Home/jre/lib/jfr.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_25.jdk/Contents/Home/jre/lib/jfxswt.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_25.jdk/Contents/Home/jre/lib/jsse.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_25.jdk/Contents/Home/jre/lib/management-agent.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_25.jdk/Contents/Home/jre/lib/plugin.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_25.jdk/Contents/Home/jre/lib/resources.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_25.jdk/Contents/Home/jre/lib/rt.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_25.jdk/Contents/Home/jre/lib/ext/cldrdata.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_25.jdk/Contents/Home/jre/lib/ext/dnsns.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_25.jdk/Contents/Home/jre/lib/ext/jfxrt.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_25.jdk/Contents/Home/jre/lib/ext/localedata.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_25.jdk/Contents/Home/jre/lib/ext/nashorn.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_25.jdk/Contents/Home/jre/lib/ext/sunec.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_25.jdk/Contents/Home/jre/lib/ext/sunjce_provider.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_25.jdk/Contents/Home/jre/lib/ext/sunpkcs11.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_25.jdk/Contents/Home/jre/lib/ext/zipfs.jar:/Users/bilal/src/spark-first/target/classes:/Users/bilal/.ivy2/cache/org.scala-lang/scala-library/jars/scala-library-2.11.6.jar:/Users/bilal/.ivy2/cache/org.scala-lang/scala-reflect/jars/scala-reflect-2.11.6.jar:/Users/bilal/.m2/repository/org/scala-lang/scala-library/2.11.6/scala-library-2.11.6.jar:/Users/bilal/.m2/repository/org/apache/spark/spark-core_2.11/1.2.1/spark-core_2.11-1.2.1.jar:/Users/bilal/.m2/repository/com/twitter/chill_2.11/0.5.0/chill_2.11-0.5.0.jar:/Users/bilal/.m2/repository/com/esotericsoftware/kryo/kryo/2.21/kryo-2.21.jar:/Users/bilal/.m2/repository/com/esotericsoftware/reflectasm/reflectasm/1.07/reflectasm-1.07-shaded.jar:/Users/bilal/.m2/repository/com/esotericsoftware/minlog/minlog/1.2/minlog-1.2.jar:/Users/bilal/.m2/repository/org/objenesis/objenesis/1.2/objenesis-1.2.jar:/Users/bilal/.m2/repository/com/twitter/chill-java/0.5.0/chill-java-0.5.0.jar:/Users/bilal/.m2/repository/org/apache/hadoop/hadoop-client/2.2.0/hadoop-client-2.2.0.jar:/Users/bilal/.m2/repository/org/apache/hadoop/hadoop-common/2.2.0/hadoop-common-2.2.0.jar:/Users/bilal/.m2/repository/commons-cli/commons-cli/1.2/commons-cli-1.2.jar:/Users/bilal/.m2/repository/org/apache/commons/commons-math/2.1/commons-math-2.1.jar:/Users/bilal/.m2/repository/xmlenc/xmlenc/0.52/xmlenc-0.52.jar:/Users/bilal/.m2/repository/commons-io/commons-io/2.1/commons-io-2.1.jar:/Users/bilal/.m2/repository/commons-logging/commons-logging/1.1.1/commons-logging-1.1.1.jar:/Users/bilal/.m2/repository/commons-lang/commons-lang/2.5/commons-lang-2.5.jar:/Users/bilal/.m2/repository/commons-configuration/commons-configuration/1.6/commons-configuration-1.6.jar:/Users/bilal/.m2/repository/commons-collections/commons-collections/3.2.1/commons-collections-3.2.1.jar:/Users/bilal/.m2/repository/commons-digester/commons-digester/1.8/commons-digester-1.8.jar:/Users/bilal/.m2/repository/commons-beanutils/commons-beanutils/1.7.0/commons-beanutils-1.7.0.jar:/Users/bilal/.m2/repository/commons-beanutils/commons-beanutils-core/1.8.0/commons-beanutils-core-1.8.0.jar:/Users/bilal/.m2/repository/org/codehaus/jackson/jackson-core-asl/1.8.8/jackson-core-asl-1.8.8.jar:/Users/bilal/.m2/repository/org/codehaus/jackson/jackson-mapper-asl/1.8.8/jackson-mapper-asl-1.8.8.jar:/Users/bilal/.m2/repository/org/apache/avro/avro/1.7.4/avro-1.7.4.jar:/Users/bilal/.m2/repository/com/google/protobuf/protobuf-java/2.5.0/protobuf-java-2.5.0.jar:/Users/bilal/.m2/repository/org/apache/hadoop/hadoop-auth/2.2.0/hadoop-auth-2.2.0.jar:/Users/bilal/.m2/repository/org/apache/commons/commons-compress/1.4.1/commons-compress-1.4.1.jar:/Users/bilal/.m2/repository/org/tukaani/xz/1.0/xz-1.0.jar:/Users/bilal/.m2/repository/org/apache/hadoop/hadoop-hdfs/2.2.0/hadoop-hdfs-2.2.0.jar:/Users/bilal/.m2/repository/org/mortbay/jetty/jetty-util/6.1.26/jetty-util-6.1.26.jar:/Users/bilal/.m2/repository/org/apache/hadoop/hadoop-mapreduce-client-app/2.2.0/hadoop-mapreduce-client-app-2.2.0.jar:/Users/bilal/.m2/repository/org/apache/hadoop/hadoop-mapreduce-client-common/2.2.0/hadoop-mapreduce-client-common-2.2.0.jar:/Users/bilal/.m2/repository/org/apache/hadoop/hadoop-yarn-client/2.2.0/hadoop-yarn-client-2.2.0.jar:/Users/bilal/.m2/repository/com/google/inject/guice/3.0/guice-3.0.jar:/Users/bilal/.m2/repository/javax/inject/javax.inject/1/javax.inject-1.jar:/Users/bilal/.m2/repository/aopalliance/aopalliance/1.0/aopalliance-1.0.jar:/Users/bilal/.m2/repository/com/sun/jersey/jersey-test-framework/jersey-test-framework-grizzly2/1.9/jersey-test-framework-grizzly2-1.9.jar:/Users/bilal/.m2/repository/com/sun/jersey/jersey-test-framework/jersey-test-framework-core/1.9/jersey-test-framework-core-1.9.jar:/Users/bilal/.m2/repository/javax/servlet/javax.servlet-api/3.0.1/javax.servlet-api-3.0.1.jar:/Users/bilal/.m2/repository/com/sun/jersey/jersey-client/1.9/jersey-client-1.9.jar:/Users/bilal/.m2/repository/com/sun/jersey/jersey-grizzly2/1.9/jersey-grizzly2-1.9.jar:/Users/bilal/.m2/repository/org/glassfish/grizzly/grizzly-http/2.1.2/grizzly-http-2.1.2.jar:/Users/bilal/.m2/repository/org/glassfish/grizzly/grizzly-framework/2.1.2/grizzly-framework-2.1.2.jar:/Users/bilal/.m2/repository/org/glassfish/gmbal/gmbal-api-only/3.0.0-b023/gmbal-api-only-3.0.0-b023.jar:/Users/bilal/.m2/repository/org/glassfish/external/management-api/3.0.0-b012/management-api-3.0.0-b012.jar:/Users/bilal/.m2/repository/org/glassfish/grizzly/grizzly-http-server/2.1.2/grizzly-http-server-2.1.2.jar:/Users/bilal/.m2/repository/org/glassfish/grizzly/grizzly-rcm/2.1.2/grizzly-rcm-2.1.2.jar:/Users/bilal/.m2/repository/org/glassfish/grizzly/grizzly-http-servlet/2.1.2/grizzly-http-servlet-2.1.2.jar:/Users/bilal/.m2/repository/org/glassfish/javax.servlet/3.1/javax.servlet-3.1.jar:/Users/bilal/.m2/repository/com/sun/jersey/jersey-server/1.9/jersey-server-1.9.jar:/Users/bilal/.m2/repository/asm/asm/3.1/asm-3.1.jar:/Users/bilal/.m2/repository/com/sun/jersey/jersey-core/1.9/jersey-core-1.9.jar:/Users/bilal/.m2/repository/com/sun/jersey/jersey-json/1.9/jersey-json-1.9.jar:/Users/bilal/.m2/repository/org/codehaus/jettison/jettison/1.1/jettison-1.1.jar:/Users/bilal/.m2/repository/stax/stax-api/1.0.1/stax-api-1.0.1.jar:/Users/bilal/.m2/repository/com/sun/xml/bind/jaxb-impl/2.2.3-1/jaxb-impl-2.2.3-1.jar:/Users/bilal/.m2/repository/javax/xml/bind/jaxb-api/2.2.2/jaxb-api-2.2.2.jar:/Users/bilal/.m2/repository/javax/activation/activation/1.1/activation-1.1.jar:/Users/bilal/.m2/repository/org/codehaus/jackson/jackson-jaxrs/1.8.3/jackson-jaxrs-1.8.3.jar:/Users/bilal/.m2/repository/org/codehaus/jackson/jackson-xc/1.8.3/jackson-xc-1.8.3.jar:/Users/bilal/.m2/repository/com/sun/jersey/contribs/jersey-guice/1.9/jersey-guice-1.9.jar:/Users/bilal/.m2/repository/org/apache/hadoop/hadoop-yarn-server-common/2.2.0/hadoop-yarn-server-common-2.2.0.jar:/Users/bilal/.m2/repository/org/apache/hadoop/hadoop-mapreduce-client-shuffle/2.2.0/hadoop-mapreduce-client-shuffle-2.2.0.jar:/Users/bilal/.m2/repository/org/apache/hadoop/hadoop-yarn-api/2.2.0/hadoop-yarn-api-2.2.0.jar:/Users/bilal/.m2/repository/org/apache/hadoop/hadoop-mapreduce-client-core/2.2.0/hadoop-mapreduce-client-core-2.2.0.jar:/Users/bilal/.m2/repository/org/apache/hadoop/hadoop-yarn-common/2.2.0/hadoop-yarn-common-2.2.0.jar:/Users/bilal/.m2/repository/org/apache/hadoop/hadoop-mapreduce-client-jobclient/2.2.0/hadoop-mapreduce-client-jobclient-2.2.0.jar:/Users/bilal/.m2/repository/org/apache/hadoop/hadoop-annotations/2.2.0/hadoop-annotations-2.2.0.jar:/Users/bilal/.m2/repository/org/apache/spark/spark-network-common_2.11/1.2.1/spark-network-common_2.11-1.2.1.jar:/Users/bilal/.m2/repository/org/apache/spark/spark-network-shuffle_2.11/1.2.1/spark-network-shuffle_2.11-1.2.1.jar:/Users/bilal/.m2/repository/net/java/dev/jets3t/jets3t/0.7.1/jets3t-0.7.1.jar:/Users/bilal/.m2/repository/commons-codec/commons-codec/1.3/commons-codec-1.3.jar:/Users/bilal/.m2/repository/commons-httpclient/commons-httpclient/3.1/commons-httpclient-3.1.jar:/Users/bilal/.m2/repository/org/apache/curator/curator-recipes/2.4.0/curator-recipes-2.4.0.jar:/Users/bilal/.m2/repository/org/apache/curator/curator-framework/2.4.0/curator-framework-2.4.0.jar:/Users/bilal/.m2/repository/org/apache/curator/curator-client/2.4.0/curator-client-2.4.0.jar:/Users/bilal/.m2/repository/org/apache/zookeeper/zookeeper/3.4.5/zookeeper-3.4.5.jar:/Users/bilal/.m2/repository/jline/jline/0.9.94/jline-0.9.94.jar:/Users/bilal/.m2/repository/com/google/guava/guava/14.0.1/guava-14.0.1.jar:/Users/bilal/.m2/repository/org/eclipse/jetty/jetty-plus/8.1.14.v20131031/jetty-plus-8.1.14.v20131031.jar:/Users/bilal/.m2/repository/org/eclipse/jetty/orbit/javax.transaction/1.1.1.v201105210645/javax.transaction-1.1.1.v201105210645.jar:/Users/bilal/.m2/repository/org/eclipse/jetty/jetty-webapp/8.1.14.v20131031/jetty-webapp-8.1.14.v20131031.jar:/Users/bilal/.m2/repository/org/eclipse/jetty/jetty-xml/8.1.14.v20131031/jetty-xml-8.1.14.v20131031.jar:/Users/bilal/.m2/repository/org/eclipse/jetty/jetty-servlet/8.1.14.v20131031/jetty-servlet-8.1.14.v20131031.jar:/Users/bilal/.m2/repository/org/eclipse/jetty/jetty-jndi/8.1.14.v20131031/jetty-jndi-8.1.14.v20131031.jar:/Users/bilal/.m2/repository/org/eclipse/jetty/orbit/javax.mail.glassfish/1.4.1.v201005082020/javax.mail.glassfish-1.4.1.v201005082020.jar:/Users/bilal/.m2/repository/org/eclipse/jetty/orbit/javax.activation/1.1.0.v201105071233/javax.activation-1.1.0.v201105071233.jar:/Users/bilal/.m2/repository/org/eclipse/jetty/jetty-security/8.1.14.v20131031/jetty-security-8.1.14.v20131031.jar:/Users/bilal/.m2/repository/org/eclipse/jetty/jetty-util/8.1.14.v20131031/jetty-util-8.1.14.v20131031.jar:/Users/bilal/.m2/repository/org/eclipse/jetty/jetty-server/8.1.14.v20131031/jetty-server-8.1.14.v20131031.jar:/Users/bilal/.m2/repository/org/eclipse/jetty/orbit/javax.servlet/3.0.0.v201112011016/javax.servlet-3.0.0.v201112011016.jar:/Users/bilal/.m2/repository/org/eclipse/jetty/jetty-continuation/8.1.14.v20131031/jetty-continuation-8.1.14.v20131031.jar:/Users/bilal/.m2/repository/org/eclipse/jetty/jetty-http/8.1.14.v20131031/jetty-http-8.1.14.v20131031.jar:/Users/bilal/.m2/repository/org/eclipse/jetty/jetty-io/8.1.14.v20131031/jetty-io-8.1.14.v20131031.jar:/Users/bilal/.m2/repository/org/apache/commons/commons-lang3/3.3.2/commons-lang3-3.3.2.jar:/Users/bilal/.m2/repository/org/apache/commons/commons-math3/3.1.1/commons-math3-3.1.1.jar:/Users/bilal/.m2/repository/com/google/code/findbugs/jsr305/1.3.9/jsr305-1.3.9.jar:/Users/bilal/.m2/repository/org/slf4j/slf4j-api/1.7.5/slf4j-api-1.7.5.jar:/Users/bilal/.m2/repository/org/slf4j/jul-to-slf4j/1.7.5/jul-to-slf4j-1.7.5.jar:/Users/bilal/.m2/repository/org/slf4j/jcl-over-slf4j/1.7.5/jcl-over-slf4j-1.7.5.jar:/Users/bilal/.m2/repository/log4j/log4j/1.2.17/log4j-1.2.17.jar:/Users/bilal/.m2/repository/org/slf4j/slf4j-log4j12/1.7.5/slf4j-log4j12-1.7.5.jar:/Users/bilal/.m2/repository/com/ning/compress-lzf/1.0.0/compress-lzf-1.0.0.jar:/Users/bilal/.m2/repository/org/xerial/snappy/snappy-java/1.1.1.6/snappy-java-1.1.1.6.jar:/Users/bilal/.m2/repository/net/jpountz/lz4/lz4/1.2.0/lz4-1.2.0.jar:/Users/bilal/.m2/repository/org/roaringbitmap/RoaringBitmap/0.4.5/RoaringBitmap-0.4.5.jar:/Users/bilal/.m2/repository/commons-net/commons-net/2.2/commons-net-2.2.jar:/Users/bilal/.m2/repository/org/spark-project/akka/akka-remote_2.11/2.3.4-spark/akka-remote_2.11-2.3.4-spark.jar:/Users/bilal/.m2/repository/org/spark-project/akka/akka-actor_2.11/2.3.4-spark/akka-actor_2.11-2.3.4-spark.jar:/Users/bilal/.m2/repository/com/typesafe/config/1.2.1/config-1.2.1.jar:/Users/bilal/.m2/repository/io/netty/netty/3.8.0.Final/netty-3.8.0.Final.jar:/Users/bilal/.m2/repository/org/spark-project/protobuf/protobuf-java/2.5.0-spark/protobuf-java-2.5.0-spark.jar:/Users/bilal/.m2/repository/org/uncommons/maths/uncommons-maths/1.2.2a/uncommons-maths-1.2.2a.jar:/Users/bilal/.m2/repository/org/spark-project/akka/akka-slf4j_2.11/2.3.4-spark/akka-slf4j_2.11-2.3.4-spark.jar:/Users/bilal/.m2/repository/org/json4s/json4s-jackson_2.11/3.2.10/json4s-jackson_2.11-3.2.10.jar:/Users/bilal/.m2/repository/org/json4s/json4s-core_2.11/3.2.10/json4s-core_2.11-3.2.10.jar:/Users/bilal/.m2/repository/org/json4s/json4s-ast_2.11/3.2.10/json4s-ast_2.11-3.2.10.jar:/Users/bilal/.m2/repository/com/thoughtworks/paranamer/paranamer/2.6/paranamer-2.6.jar:/Users/bilal/.m2/repository/org/scala-lang/scalap/2.11.0/scalap-2.11.0.jar:/Users/bilal/.m2/repository/com/fasterxml/jackson/core/jackson-databind/2.3.1/jackson-databind-2.3.1.jar:/Users/bilal/.m2/repository/com/fasterxml/jackson/core/jackson-annotations/2.3.0/jackson-annotations-2.3.0.jar:/Users/bilal/.m2/repository/com/fasterxml/jackson/core/jackson-core/2.3.1/jackson-core-2.3.1.jar:/Users/bilal/.m2/repository/org/apache/mesos/mesos/0.18.1/mesos-0.18.1-shaded-protobuf.jar:/Users/bilal/.m2/repository/io/netty/netty-all/4.0.23.Final/netty-all-4.0.23.Final.jar:/Users/bilal/.m2/repository/com/clearspring/analytics/stream/2.7.0/stream-2.7.0.jar:/Users/bilal/.m2/repository/com/codahale/metrics/metrics-core/3.0.0/metrics-core-3.0.0.jar:/Users/bilal/.m2/repository/com/codahale/metrics/metrics-jvm/3.0.0/metrics-jvm-3.0.0.jar:/Users/bilal/.m2/repository/com/codahale/metrics/metrics-json/3.0.0/metrics-json-3.0.0.jar:/Users/bilal/.m2/repository/com/codahale/metrics/metrics-graphite/3.0.0/metrics-graphite-3.0.0.jar:/Users/bilal/.m2/repository/org/tachyonproject/tachyon-client/0.5.0/tachyon-client-0.5.0.jar:/Users/bilal/.m2/repository/org/tachyonproject/tachyon/0.5.0/tachyon-0.5.0.jar:/Users/bilal/.m2/repository/org/spark-project/pyrolite/2.0.1/pyrolite-2.0.1.jar:/Users/bilal/.m2/repository/net/sf/py4j/py4j/0.8.2.1/py4j-0.8.2.1.jar:/Users/bilal/.m2/repository/org/spark-project/spark/unused/1.0.0/unused-1.0.0.jar:/Users/bilal/.m2/repository/org/apache/spark/spark-mllib_2.11/1.2.1/spark-mllib_2.11-1.2.1.jar:/Users/bilal/.m2/repository/org/apache/spark/spark-streaming_2.11/1.2.1/spark-streaming_2.11-1.2.1.jar:/Users/bilal/.m2/repository/org/apache/spark/spark-sql_2.11/1.2.1/spark-sql_2.11-1.2.1.jar:/Users/bilal/.m2/repository/org/apache/spark/spark-catalyst_2.11/1.2.1/spark-catalyst_2.11-1.2.1.jar:/Users/bilal/.m2/repository/org/scala-lang/scala-compiler/2.11.2/scala-compiler-2.11.2.jar:/Users/bilal/.m2/repository/org/scala-lang/modules/scala-xml_2.11/1.0.2/scala-xml_2.11-1.0.2.jar:/Users/bilal/.m2/repository/org/scala-lang/modules/scala-parser-combinators_2.11/1.0.2/scala-parser-combinators_2.11-1.0.2.jar:/Users/bilal/.m2/repository/org/scala-lang/scala-reflect/2.11.2/scala-reflect-2.11.2.jar:/Users/bilal/.m2/repository/com/twitter/parquet-column/1.6.0rc3/parquet-column-1.6.0rc3.jar:/Users/bilal/.m2/repository/com/twitter/parquet-common/1.6.0rc3/parquet-common-1.6.0rc3.jar:/Users/bilal/.m2/repository/com/twitter/parquet-encoding/1.6.0rc3/parquet-encoding-1.6.0rc3.jar:/Users/bilal/.m2/repository/com/twitter/parquet-generator/1.6.0rc3/parquet-generator-1.6.0rc3.jar:/Users/bilal/.m2/repository/com/twitter/parquet-hadoop/1.6.0rc3/parquet-hadoop-1.6.0rc3.jar:/Users/bilal/.m2/repository/com/twitter/parquet-format/2.2.0-rc1/parquet-format-2.2.0-rc1.jar:/Users/bilal/.m2/repository/com/twitter/parquet-jackson/1.6.0rc3/parquet-jackson-1.6.0rc3.jar:/Users/bilal/.m2/repository/org/jblas/jblas/1.2.3/jblas-1.2.3.jar:/Users/bilal/.m2/repository/org/scalanlp/breeze_2.11/0.10/breeze_2.11-0.10.jar:/Users/bilal/.m2/repository/org/scalanlp/breeze-macros_2.11/0.3.1/breeze-macros_2.11-0.3.1.jar:/Users/bilal/.m2/repository/com/github/fommil/netlib/core/1.1.2/core-1.1.2.jar:/Users/bilal/.m2/repository/net/sourceforge/f2j/arpack_combined_all/0.1/arpack_combined_all-0.1.jar:/Users/bilal/.m2/repository/net/sf/opencsv/opencsv/2.3/opencsv-2.3.jar:/Users/bilal/.m2/repository/com/github/rwl/jtransforms/2.4.0/jtransforms-2.4.0.jar:/Users/bilal/.m2/repository/org/spire-math/spire_2.11/0.7.4/spire_2.11-0.7.4.jar:/Users/bilal/.m2/repository/org/spire-math/spire-macros_2.11/0.7.4/spire-macros_2.11-0.7.4.jar:/Users/bilal/.m2/repository/com/cloudera/datascience/spark-book-parent/1.0.0/spark-book-parent-1.0.0.jar:/Applications/IntelliJ IDEA 14.app/Contents/lib/idea_rt.jar" com.cloudera.datascience.graph.RunMedGraph Connected to the target VM, address: '127.0.0.1:55166', transport: 'socket' Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 15/08/03 19:06:48 INFO Remoting: Starting remoting 15/08/03 19:06:49 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected]:55169] hello citdata Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 1, localhost): org.xml.sax.SAXParseException; lineNumber: 126; columnNumber: 19; XML document structures must start and end within the same entity. at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:203) at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:177) at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:441) at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:368) at com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:1436) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.endEntity(XMLDocumentFragmentScannerImpl.java:903) at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.endEntity(XMLDocumentScannerImpl.java:563) at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.endEntity(XMLEntityManager.java:1394) at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(XMLEntityScanner.java:1757) at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.peekChar(XMLEntityScanner.java:490) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2718) at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:606) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:510) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:848) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:777) at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141) at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1213) at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:649) at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.parse(SAXParserImpl.java:333) at scala.xml.factory.XMLLoader$class.loadXML(XMLLoader.scala:41) at scala.xml.XML$.loadXML(XML.scala:60) at scala.xml.factory.XMLLoader$class.loadString(XMLLoader.scala:60) at scala.xml.XML$.loadString(XML.scala:60) at com.cloudera.datascience.graph.RunMedGraph$$anonfun$2.apply(MedGraph.scala:71) at com.cloudera.datascience.graph.RunMedGraph$$anonfun$2.apply(MedGraph.scala:71) at scala.collection.Iterator$$anon$11.next(Iterator.scala:370) at scala.collection.Iterator$$anon$11.next(Iterator.scala:370) at org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:249) at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:163) at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:70) at org.apache.spark.rdd.RDD.iterator(RDD.scala:245) at org.apache.spark.rdd.FlatMappedRDD.compute(FlatMappedRDD.scala:33) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280) at org.apache.spark.rdd.RDD.iterator(RDD.scala:247) at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280) at org.apache.spark.rdd.RDD.iterator(RDD.scala:247) at org.apache.spark.rdd.MappedValuesRDD.compute(MappedValuesRDD.scala:31) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280) at org.apache.spark.rdd.RDD.iterator(RDD.scala:247) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)

Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1214) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1203) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1202) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1202) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:696) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:696) at scala.Option.foreach(Option.scala:257) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:696) at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1420) at akka.actor.Actor$class.aroundReceive(Actor.scala:465) at org.apache.spark.scheduler.DAGSchedulerEventProcessActor.aroundReceive(DAGScheduler.scala:1375) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.ActorCell.invoke(ActorCell.scala:487) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238) at akka.dispatch.Mailbox.run(Mailbox.scala:220) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Disconnected from the target VM, address: '127.0.0.1:55166', transport: 'socket'

1

1 Answers

1
votes

Found the problem was with Sandy's sample code - raised it as issue https://github.com/sryza/aas/issues/42

Basically, def loadMedline code has incorrect start and end tag keys (MedlineCitation instead of MetlineCitationSet)