0
votes

Code is here: github link

Error is:

ren: null at []]: java.lang.IllegalArgumentException: Could not find implementing class for org.apache.lucene.analysis.tokenattributes.OffsetAttribute at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:338)

at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:378)

at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:298)

at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:282)

at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:277)

at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)

at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)

at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)

at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)

Caused by: java.lang.IllegalArgumentException: Could not find implementing class for org.apache.lucene.analysis.tokenattributes.OffsetAttribute

at org.apache.lucene.util.AttributeSource$AttributeFactory$DefaultAttributeFactory.getClassForInterface(AttributeSource.java:94)

at org.apache.lucene.util.AttributeSource$AttributeFactory$DefaultAttributeFactory.createAttributeInstance(AttributeSource.java:67)

at org.apache.lucene.util.AttributeSource.addAttribute(AttributeSource.java:276)

at org.apache.lucene.analysis.standard.StandardTokenizer.(StandardTokenizer.java:171)

at datafu.pig.text.lucene.NGramTokenize.exec(NGramTokenize.java:48)

at datafu.pig.text.lucene.NGramTokenize.exec(NGramTokenize.java:33)

at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:330)

at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNextDataBag(POUserFunc.java:374)

at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:309)

... 9 more

1

1 Answers

1
votes

This appears to be a packaging issue. When building datafu I did not find o.a.lucene.analysis.tokenatributes.OffsetAttributeImpl in the datafu-1.2.1-SNAPSHOT.jar

Lucene's AttributeSource finds implementations for attribute classes at runtime, so it's necessary to package o.a.lucene.analysis.tokenatributes.OffsetAttributeImpl in the datafu-1.2.1-SNAPSHOT.jar in addition to the o.a.lucene.analysis.tokenatributes.OffsetAttribute class.

You will likely run into this problem with other attribute classes as well.

From what I understand autojar is explicitly following class references at compile time in order to determine what goes in the final jar. This is why it's not picking up the attribute Impl classes, which are resolved at runtime.

I'm not sufficiently familiar with autojar to suggest a fix but if there is a way to explicitly include classes, you should include org.apache.lucene.analysis.tokenattributes.*Impl