0
votes

Hi My reqmnt is to create Analytics from http://10.3.9.34:9900/messages that is pull data from from http://10.3.9.34:9900/messages and put this data in HDFS location /user/cloudera/flume and from HDFS create Analytics report using Tableau or HUE UI . I tried with below code at scala console of spark-shell of CDH5.5 but unable to fetch data from the http link

import org.apache.spark.SparkContext
val dataRDD = sc.textFile("http://10.3.9.34:9900/messages")
dataRDD.collect().foreach(println)
dataRDD.count()
dataRDD.saveAsTextFile("/user/cloudera/flume")

I get below error at scala console:

java.io.IOException: No FileSystem for scheme: http at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2623) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2637) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:93) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2680) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2662) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:379) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)

1

1 Answers

2
votes

You can't use a http endpoint as input, it needs to be a file system such as HDFS, S3 or local.

You would need a separate process which is pulling data from this endpoint, perhaps using something like Apache NiFi to land the data on a filesystem where you can then use it as input to Spark.