1
votes

I use the following command at the commandline to cluster data using Mahout kmeans algorithm

mahout kmeans -i /vect_out/tfidf-vectors/ -c /out_canopy -o /out_kmeans -dm   
org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure -cd 1.0 -x 20 -cl

where /out_canopy is the directory containing clusters created using Mahout canopy clustering which contains a clusters-0 directory which itself contains a directory named _logs and a file named part-r-00000

but it keeps reporting the following error

java.lang.IllegalStateException: No clusters found. Check your -c path.
at org.apache.mahout.clustering.kmeans.KMeansMapper.setup
2

2 Answers

0
votes

are you sure that /out_canopy is directory? Did you tried:

file /out_canopy

It seems there is a typo and you wanted to write only out_canopy or somehow similar ...

0
votes

This is a particularly vexing problem.

1. Swallow IllegalStateExceptions thrown by removeShutdownHook in FileSystem. The javadoc states:

    public boolean removeShutdownHook(Thread hook)
    Throws:
    IllegalStateException - If the virtual machine is already in the process of shutting down 

So if we are getting this exception, it MEANS we are already in the process of shutdown, so we CANNOT, try what we may, removeShutdownHook. If Runtime had a method Runtime.isShutdownInProgress(), we could have checked for it before the removeShutdownHook call. As it stands, there is no such method. In my opinion, this would be a good patch regardless of the needs for this JIRA.

2. Not send SIGTERMs from the NM to the MR-AM in the first place. Rather we should expose a mechanism for the NM to politely tell the AM its no longer needed and should shutdown asap. Even after this, if an admin were to kill the MRAppMaster with a SIGTERM, the JobHistory would be lost defeating the purpose of 3614