0
votes

I have this dilemma of making a .json file into a file format that needs to be uploaded to AWS Redshift. I followed these articles to proceed with it:

https://aws.amazon.com/blogs/big-data/simplify-querying-nested-json-with-the-aws-glue-relationalize-transform/

https://github.com/aws-samples/aws-glue-samples/blob/master/examples/join_and_relationalize.md

As part of the instructions in both articles, DevEndpoint notebook must be launched. I was successful to create it however, I am unable to run any queries because I am unable to find any script editor, as seen below.

Please click to see image of issue

Am I missing any configuration?

I need to transform JSON files and I am not even halfway on it.

1

1 Answers

0
votes

How did you set up the dev end point? Is it an AWS Glue provided Dev Endpoint server, or a locally setup notebook? Can help with your issue if more information is provided.

Anyway, please refer hereand setting up zeppelin on windows, for any help on setting up local development environment & zeppelin notebook.

Once you set up the zeppelin notebook, have an SSH connection established (using AWS Glue DevEndpoint URL), so you can have access to the data catalog/crawlers,etc., and also the S3 bucket where your data resides. Then, you can create your python scripts in the zeppelin notebook, and run from the zeppelin.

You can use dev instance provided by Glue, but you may incur additional costs for the same(EC2 instance charges).

Environment settings (updated in response to comments):

JAVA_HOME=E:\Java7\jre7
Path=E:\Python27;E:\Python27\Lib;E:\Python27\Scripts;
PYTHONPATH=E:\spark-2.1.0-bin-hadoop2.7\python;E:\spark-2.1.0-bin-hadoop2.7\python\lib\py4j-0.10.4-src.zip;E:\spark-2.1.0-bin-hadoop2.7\python\lib\pys
park.zip
SPARK_HOME=E:\spark-2.1.0-bin-hadoop2.7

Change the drive name/ folders accordingly. Let me know if any help neeed.

Regards