search or convert octal sequences

Question

I have uncompressed a PDF file with pdftk and I am trying to edit it in Emacs with regexp.

The problem is that this file has accented characters and Emacs displays them as octal sequences: e.g. \340 for à. To edit this file I have two possibilities (at least I think so).

a) Apply an encoding such that Emacs will display actual accented characters and not their octal equivalent. Vim already displays accented characters properly;

b) Search octal sequences with regexps.

As for a), I have tried (set-buffer-file-coding-system 'utf-8-dos), (set-buffer-file-coding-system 'utf-8-unix), (set-buffer-file-coding-system 'raw-text) without success.

As for b), after applying set-buffer-file-coding-system, I am able to incremental search for the octal sequences with the C-q ... RET, but I am unable to do what I really need: replacing strings. In fact C-q ... RET, does not match octal sequences when using M-% or C-M-%. C-x 8 `... doesn't work either.

Thanks in advance. Antonio

Newbie here, hope it is possible to post links. Anyway I just created a one line test file: filedropper.com/test_16 . In Emacs have a look at line 47 and note how you can manually replace \340 with à, save and reopen it in your PDF viewer. — antonio
A single high-bit octal character is most certainly not UTF-8. Try with CP1252 or perhaps CP850. — tripleee

Thomas Thomas · Accepted Answer · 2012-11-26T03:34:47

Try the following key-sequence in the buffer visiting the PDF file:

C-x RET r character-coding RET

This will revisit the file using the character-encoding you specify.

Alternatively, if you want to specify the character encoding to use before visiting a file, you can do

C-x RET c character-coding RET

immediately before typing C-x C-f.

See the documentation for more details.

search or convert octal sequences

2 Answers