How to see output in Amazon EMR/S3?

Question

I am new to Amazon Services and tried to run the application in Amazon EMR.

For that I have followed the steps as:

1) Created the Hive Scripts which contains --> create table, load data statement in Hive with some file and select * from command.

2) Created the S3 Bucket. And I load the object into it as: Hive Script, File to load into the table.

3) Then Created the Job Flow (Using Sample Hive Program). Given the input, ouput, and script path (like s3n://bucketname/script.q, s3n://bucketname/input.txt, s3n://bucketname/out/). Didn't create out directory. I think it will get created automatically.

4) Then Job Flow start to run and after some time I saw the states as STARTING, BOOTSTRAPING, RUNNING, and SHUT DOWN.

5) While running SHUT DOWN state, it get terminated automatically showing FAILES status for SHUT DOWN.

Then on the S3, I didn't see the out directory. How to see the output? I saw directory like daemons, nodes, etc......

And also how to see the data from HDFS in Amazon EMR?

I just had the same problem; pretty painful after a massive job. Unfortunately, I let the job auto-terminate upon completion. Were you able to track down your data and/or reason it failed? — Dolan Antenucci

Mark Grover Mark Grover · Accepted Answer · 2012-04-26T03:55:02

The output path that you specified in step 3 should contain your results (From your description, it is s3n://bucketname/out/)

If it doesn't, something went wrong with your Hive script. If your Hive job failed, you will find information about the failure/exception in the jobtracker log. The jobtracker log exists under <s3 log location>/daemons/<master instance name>/hadoop-hadoop-jobtracker-<some Amazon internal IP>.log

Only one file in your logs directory would have it's S3 key in the above format. This file will contain any exceptions that may have happened. You probably want to concentrate on the bottom end of the file.

How to see output in Amazon EMR/S3?

1 Answers