3
votes

I've been wondering about extracted text's encoding using IFilter.

IFilter::GetText() retrieves WCHAR*, but what if the file is encoded with ASCII? What about other Unicode encoding (such as UTF-8 or UTF-16?)?

As I see it, it's either IFilter taking care of converting the extracted text to a single encoding (if it is the case - what is this encoding?), and if not, how do I know which encoding is it?

1

1 Answers

2
votes

The output text is UTF-16 (everything in Windows that uses WCHAR is UTF-16). There is no way to query the encoding of the input data, you would have to analyze that data yourself if needed.