I have a mixture of English words and Chinese characters, and I would like to convert the text into a mixture of English words and the XML/HTML-style numerical entities of the Chinese characters.
For example, the following mixture of English words, numbers and Chinese characters
Title: 目录.doc
Level: 1
PageNumber: 1
Begin
Title: 1 C语言概述
Level: 1
PageNumber: 13
BeginTitle: 1.1 C语言的发展过程
Level: 2
PageNumber: 13
Begin
Title: 1.2 当代最优秀的程序设计语言
would be turned into the following, with the Chinese characters replaced by their XML/HTML-style numerical entities:
Title: 目录.doc
Level: 1
PageNumber: 1
Begin
Title: 1 C语言概述
Level: 1
PageNumber: 13
Begin
Title: 1.1 C语言的发展过程
Level: 2
PageNumber: 13
Begin
Title: 1.2 当代最优秀的程序设计语言
I wonder if I can program this in Python?
Also possible to program for turning the Chinese characters into their Unicode UTF-8 code?
Thanks in advance!