1
votes

I want to find out the intent of sentence for the same I need to find out to which entity pronoun is referring.

Consider the example

My name is Rushabh. I live in Pune.

In the second sentence 'I' is referring to the Rushabh. How to find it out using python.

4

4 Answers

8
votes

In NLP it is called co-reference resolution. There is a package in python called neuralcoref. git.

1
votes

I am working on similar problem , its related to tweets. The person included multiple person-references and i need to find which pronoun is for whom so that I can replace it with the Noun and hence classify the sentiment accordingly.

import spacy
nlp = spacy.load('en')    
sent = "Modi is a great leader.He has made India proud. Rahul Gandhi is naive . He is not fit to be prime minister."
doc=nlp(sent)

sub_toks = [tok for tok in doc if ((tok.dep_ == "nsubj") )]
print(sub_toks)

nc= [x for x in doc.noun_chunks]
print(nc)


l=[]
for i,token in enumerate(doc):
    if token.pos_ in ('PROPN','PRON'):
        l.append([token.text,i,token.pos_])

This gave me a list of desired specifics , but I still need to find a way to implement my thoughts in a less computational manner as I have over 50K tweets and per sentence multiple loops will take ages.

0
votes

In general, most heuristic applications use some kind of a focus in a given stack. a stack is a list of entities that might be referred to anaphorically (anaphora resolution is the topic at whose surface you scratch here). This list my contain nouns, pronouns and abstract entities like events etc. Leaving aside abstract anaphora, which is the most complicated application field of all of them, you check for congruence (grammatical agreement in person and number) for candidates in the focus (most recent or prominent in the stack) and the rest of the stack from most recent backwards, in order to match the best. Don't forget updating the stack, that is removing entities which were not subsequentially referred to. It is quite normal that you get bad results, since this is still an active field of research and there is no application that will give you perfect recall and precision. If you get 70 % correct, you can already count this a success.

If you need more precision, try to find a corpus on anaphora resolution to train a machine learning model. today it is more popular and promising anyways, especially when learning from older heuristic methods like the one above.

-1
votes

That is a rather vague question, but you might search for POS (part of speech) tagging within Natural Language Processing. Here is an example.