I have a string which is in unicode. In the string, I am replacing the '\r' character with '<\p>' in the string, and passing that to BeautifulSoup for parsing.
If I print after the string after the replacement, I see that the replacement went fine. But when I pass the string to BeautifulSoup it treats it like < and >. Why is that ?
I seems to have to do with encoding, but not sure what.
replacing the string
fileString.encode('utf-8')
fileString = re.sub('\r', "/<\p>", fileString)
fileString.encode('utf-8')
htmlTag = BeautifulSoup(fileString, from_encoding='utf-8')