I am running a parser file to parse files from .txt files from a local directory. Now these files are moved to HDFS cluster and I would like to configure my Pycharm to access the HDFS cluster. Can someone assist me in doing this?
1 Answers
0
votes
I would like to configure my Pycharm to access the HDFS cluster
Depends on what type of access you're referring to. As far as the HDFS CLI basics, you can do that with os
# Not tested
import os
import sys
f = "{}/tmp.txt".format(os.getcwd())
cmds = [
"touch {}".format(f),
"hadoop fs -copyFromLocal {} /user/$USER/".format(f),
"rm -fv {}".format(f),
"hadoop fs -copyToLocal /user/$USER/tmp.txt $PWD/",
]
for cmd in cmds:
os.system(cmd)
assert os.path.exists(f)
But if you're looking for more granular control you'll want something like pyarrow (or the like)
open()will not work - OneCricketeer