I'm new to AWS and Hive, and I'm trying to use Hive to analyze Google Ngrams data. I tried to save a table as tab-delimited CSV in an S3 bucket, but now I don't know how to view it or download it to see if my job executed correctly.
The query I used to create the table was
CREATE EXTERNAL TABLE test_table2 (
gram string,
year int,
occurrences bigint,
pages bigint,
books bigint
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION 's3://mybucket/sub-bucket/test-table2.txt';
I then filled the table with data:
INSERT OVERWRITE TABLE test_table2
SELECT
gram,
year,
occurrences,
pages,
books
FROM
eng1m_5grams_normed
WHERE
gram = 'early bird gets the worm';
The query ran fine, and I think everything worked correctly. However, when I navigate to my bucket in the S3 Management Console online, the text file appears as a folder containing a bunch of files. These files have long hexadecimal character names and are 0 bytes big.
Is this just the text file represented as a directory? Is there a way I can view or download the file to see if my query worked? I tried to make the directory public so I could download it, but the download button in the "Actions" dropdown menu is still greyed out.