I am using OpenNLP JAVA API for Sentence Tokenization and it is using space character to tokenize the sentence and is splitting every word.
Is there any way to i can skip the word splitting or tokenization for some specific words.
For Example in a sentence. "A quick brown fox jumping over the lazy dog". OpenNLP split/tokenize the sentence as
a
quick
brown
fox
jumping
over
the
lazy
dog
i want to skip tokenization for the word "quick brown fox" and "lazy dog" , so the expected output will be
a
quick brown fox
jumping
over
the
lazy dog