I am trying to load Google Cloud Storage files to on premise Hadoop cluster. I developed a workaround (program) to download files on local EdgeNode and distcp
to Hadoop. But this seems two-way workaround and not much impressive. I have gone through few websites (links1, link2) which summarizes using Hadoop Google Cloud Storage connector for such process and need infrastructure level configuration which is not possible in all cases.
Is there any way to copy files directly from Cloud Storage to Hadoop programmatically using Python or Java.
distcp
will require infrastructure level configuration. I am not sure it will be permitted. Do you have sample example using Spark? – Sandeep Singh