I'm trying to train the Stanford NER classifier to identify specific things in text data bases.I have made a new .prop file and a training file, and I get results, but they are the default results that I would get if I just ran the classifier without training. Anything I can do to fit this?
This is my code:
import edu.stanford.nlp.io.IOUtils;
import edu.stanford.nlp.ling.CoreAnnotations;
import edu.stanford.nlp.ling.CoreLabel;
import edu.stanford.nlp.pipeline.Annotation;
import edu.stanford.nlp.pipeline.StanfordCoreNLP;
import edu.stanford.nlp.util.CoreMap;
import edu.stanford.nlp.util.StringUtils;
import java.io.File;
import java.io.IOException;
import java.util.List;
import java.util.Properties;public class NLP_train {
public static void main(String[] args) throws IOException {
Properties props = StringUtils.propFileToProperties("C:/Users/Admin/Desktop/trainingfile.prop");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
// read some text in the text variable
File inputFile = new File("C:/Users/Admin/Desktop/target.txt");
// create an empty Annotation just with the given text
Annotation document = new Annotation(IOUtils.slurpFileNoExceptions(inputFile));
// run all Annotators on this text
pipeline.annotate(document);
List<CoreMap> sentences = document.get(CoreAnnotations.SentencesAnnotation.class);
for (CoreMap sentence : sentences) {
// traversing the words in the current sentence
// a CoreLabel is a CoreMap with additional token-specific methods
for (CoreLabel token : sentence.get(CoreAnnotations.TokensAnnotation.class)) {
// this is the text of the token
String word = token.get(CoreAnnotations.TextAnnotation.class);
// this is the POS tag of the token
String pos = token.get(CoreAnnotations.PartOfSpeechAnnotation.class);
// this is the NER label of the token
String ne = token.get(CoreAnnotations.NamedEntityTagAnnotation.class);
System.out.println(String.format("Print: word: [%s] pos: [%s] ne: [%s]", word, pos, ne));
}
}
}
}
Here is my .prop file:
trainFile = C:/Users/Admin/Desktop/trainingfile.tsv
serializeTo = C:/Users/Admin/Desktop/ner-model.ser.gz
map = word=0,answer=1
useClassFeature=true
useWord=true
useNGrams=true
noMidNGrams=true
useDisjunctive=true
maxNGramLeng=6
usePrev=true
useNext=true
useSequences=true
usePrevSequences=true
maxLeft=1
the next 4 deal with word shape features
useTypeSeqs=true
useTypeSeqs2=true
useTypeySequences=true
wordShape=chris2useLC
And an excerpt of my training file:
The 0
Type Radar
347G Radar
`` 0
Rice 0
Bowl 0
'' 0