1
votes

I am trying to use s3DistCp to combine some small files in 1 S3 folder to another S3 folder. Script is something like the following:

elastic-mapreduce --jobflow j-33EDUGSQCN0PZ --jar \
/home/hadoop/lib/emr-s3distcp-1.0.jar \
--args '--src,s3://li-test/data, \
--dest,s3://li-test/result, \
--groupBy,[0-9]*,\
--targetSize,128'

But I am getting java.lang.RuntimeException error as following. Help is needed. Thanks!

Exception in thread "main" java.lang.RuntimeException: Argument \ --dest doesn't match. at emr.hbase.options.Options.parseArguments(Options.java:75) at emr.hbase.options.Options.parseArguments(Options.java:57) at com.amazon.elasticmapreduce.s3distcp.S3DistCp$S3DistCpOptions.
(S3DistCp.java:124) at com.amazon.elasticmapreduce.s3distcp.S3DistCp.run(S3DistCp.java:545) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at com.amazon.elasticmapreduce.s3distcp.Main.main(Main.java:13) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:187)

3

3 Answers

1
votes

After decompiling the emr-s3distcp-1.0.jar that is stored in the emr cluster's /home/hadoop/lib folder I found that the java code is looking for the parameters to be of this style:

--src=s3://BUCKET-NAME/139kb-input --dest=s3://BUCKET-NAME/139kb-output

The specific difference between this line and the documentation is the use of '=' instead of a ',' between the argument and the argument's value.

This is the if statement java is using:

if (argument.length() >= this.arg.length() + 1 && argument.substring(0, this.arg.length() + 1).equals(this.arg + "="))

where this.arg is "--src" and argument is "--src=s3://BUCKET-NAME/139kb-input"

CAVEAT: This was the case when creating the step through the web interface as a custom jar. Creating the step from the command line works with using a ',' as the documentation says instead of a '='.

0
votes

It seems to be a silly mistake. It is reading the following as to be a S3DistCp command line option :

\ --dest

So instead of trying to break you command into several line, why don't you give a command like follows:

elastic-mapreduce --jobflow j-33EDUGSQCN0PZ --jar /home/hadoop/lib/emr-s3distcp-1.0.jar --args '--src,s3://li-test/data, --dest,s3://li-test/result, --groupBy,[0-9]*,--targetSize,128'
0
votes

The error message says \ --dest doesn't match, so that means it thinks the \ is part of the argument. Try this:

elastic-mapreduce --jobflow j-33EDUGSQCN0PZ --jar \
/home/hadoop/lib/emr-s3distcp-1.0.jar \
--args '--src,s3://li-test/data, --dest,s3://li-test/result, --groupBy,[0-9]*, --targetSize,128'