No space left on device exception, amazon EMR medium instances and S3

Question

I am running a MapReduce job on Amazon EMR which creates 40 output files, about 130MB each. The 9 last reduce tasks fail with a "No space left on device" exception. Is this a matter of a false configuration of the cluster? The job runs without a problem with fewer input files, fewer output files and fewer reducers. Any help will be much appreciated. Thanks! Full stacktrace below:

Error: java.io.IOException: No space left on device
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:345)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
at java.security.DigestOutputStream.write(DigestOutputStream.java:148)
at com.amazon.ws.emr.hadoop.fs.s3n.MultipartUploadOutputStream.write(MultipartUploadOutputStream.java:135)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:60)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
at org.apache.hadoop.io.compress.CompressorStream.compress(CompressorStream.java:83)
at org.apache.hadoop.io.compress.CompressorStream.finish(CompressorStream.java:92)
at org.apache.hadoop.io.compress.CompressorStream.close(CompressorStream.java:105)
at java.io.FilterOutputStream.close(FilterOutputStream.java:160)
at org.apache.hadoop.mapreduce.lib.output.TextOutputFormat$LineRecordWriter.close(TextOutputFormat.java:111)
at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.close(ReduceTask.java:558)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:637)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)

EDIT

I did some further attempts but unfortunately I am still getting errors. I thought I might not have enough memory on my instances because of the replication factor mentioned in the comment below, so I tried with large instead of medium instances that I was experimenting with until now. But I got another exception this time:

Error: java.io.IOException: Error closing multipart upload
at com.amazon.ws.emr.hadoop.fs.s3n.MultipartUploadOutputStream.uploadMultiParts(MultipartUploadOutputStream.java:207)
at com.amazon.ws.emr.hadoop.fs.s3n.MultipartUploadOutputStream.close(MultipartUploadOutputStream.java:222)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:105)
at org.apache.hadoop.io.compress.CompressorStream.close(CompressorStream.java:106)
at java.io.FilterOutputStream.close(FilterOutputStream.java:160)
at org.apache.hadoop.mapreduce.lib.output.TextOutputFormat$LineRecordWriter.close(TextOutputFormat.java:111)
at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.close(ReduceTask.java:558)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:637)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: java.util.concurrent.ExecutionException:       com.amazonaws.services.s3.model.AmazonS3Exception: The Content-MD5 you specified did not match what we received. (Service: Amazon S3; Status Code: 400; Error Code: BadDigest; 
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:188)

The result is that only about 70% of the expected output files are produced, the remaining reduce tasks fail. Then I tried uploading a large file to my S3 bucket in case there wasn't enough memory there but that doesn't seem to be the problem.

I am using the aws Elastic MapReduce service. Any ideas?

vefthym vefthym · Accepted Answer · 2014-09-16T15:00:41

The problem means that there is no space to store the output (or temporary output) of your MapReduce job.

Some things to check are:

Have you deleted unnecessary files from HDFS? Run hadoop dfs -ls / command to check the files stored on HDFS. (In case you use a Trash, make sure you empty it, too.)
Do you use compression to store the output (or temporary output) of your jobs? You can do so by setting as output format the SequenceFileOutputFormat, or by setting setCompressMapOutput(true);
What is the replication factor? By default it is set to 3, but if there is a space issue, you can risk to set it to 2, or 1, in order to make your program run.

It could be an issue that some of your reducers output a significantly larger amount of data than others, so check your code, too.

No space left on device exception, amazon EMR medium instances and S3

2 Answers