0
votes

I have installed spark and hadoop in standalone modes on ubuntu virtualbox for my learning. I am able to do normal hadoop mapreduce operations on hdfs without using spark. But when I use below code in spark-shell,

val file=sc.textFile("hdfs://localhost:9000/in/file")
scala>file.count()

I get "input path does not exist." error. The core-site.xml has fs.defaultFS with value hdfs://localhost:9000. If I give localhost without the port number, I get "Connection refused" error as it is listening on default port 8020. Hostname and localhost are set to loopback addresses 127.0.0.1 and 127.0.1.1 in etc/hosts. Kindly let me know how to resolve this issue. Thanks in advance!

2
try this in terminal hadoop fs -ls hdfs://localhost:9000/in/. Is file available?WoodChopper

2 Answers

1
votes

I am able to read and write into the hdfs using

"hdfs://localhost:9000/user/<user-name>/..."

Thank you for your help..

0
votes

Probably your configuration is alright, but the file is missing, or in an unexpected location...

1) try:

sc.textFile("hdfs://in/file")
sc.textFile("hdfs:///user/<USERNAME>/in/file")

with USERNAME=hadoop, or your own username

2) try on the command line (outside spark-shell) to access that directory/file :

hdfs dfs -ls /in/file