I am trying to create a EMR cluster on AWS with below CLI Command, but it does not create cluster in consistent view and server side encryption flag is not getting set (fs.s3.consistent and fs.s3.enableServerSideEncryption both are false in emrfs-site.xml). Whats wrong?
aws emr create-cluster \
--name "reporting-aws-cli-temp" \
--instance-type m1.medium \
--service-role EMR_DefaultRole \
--instance-count 2 \
--ami-version 3.3.1 \
--ec2-attributes SubnetId=subnet-111111,KeyName=someKey,InstanceProfile=server-role \
--log-uri s3://some-logs \
--emrfs SSE=true,Consistent=true,RetryPeriod=3,Args=[fs.s3.serverSideEncryptionAlgorithm=AES256]
2nd part of the question is I have is as below
Problem Statement The CSV data which we want to analyze will be periodically be posted from AWS EC2 instances (server) to Amazon S3 bucket and we will be using Hive to read data from the Amazon S3 bucket and doing analysis. The data as I post on Amazon S3 needs to be encrypted and hive has to first decrypt the file and then analyse
Current state We are able to achieve the following Periodically post the file to S3 in 3 separate way
- Plain CSV file which we can download and read
- Protecting the data with Key at client side and then uploading the file (Reference : http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingClientSideEncryption.html)
- Protecting the data with server side encryption (SSE-S3) and then uploading (http://docs.aws.amazon.com/AmazonS3/latest/dev/serv-side-encryption.html)
Creation of EMR cluster on AWS with Hive (version 0.13.0) installed on it. We were able to create External tables and added partitions to point to Plain CSV data and read and do simple analysis on it The point where we are stuck up is if the data in encrypted either with client side encryption or server side encryption, how can hive decrypt the data in file prior to reading data inside it.