I am trying to use re.findall to get all of the Capitalized words and abbreviations. I have figured out regular expressions to find each individually, but when I try to combine the two, I end up being returned tuples with an empty string and then the item that I wanted to find.
Here is my regular expression that seems to not work- I imagine its a quick fix I am just unaware of:
x = re.findall("([A-Z][A-Za-z]+\.?)|(\\b[A-Z](?:[\\.&]?[A-Z]){2,}\\b)", txt) #just has extra "" in each set
edit:
I am currently using this as my test case:
"USA. U.S.A America."
This is my output:
[('USA.', ''), ('', 'U.S.A'), ('America.', '')]
txt
? – Niel Godfrey Ponciano