How do I remove ï»¿ from the beginning of a file?

157

votes

I have a CSS file that looks fine when I open it using gedit, but when it's read by PHP (to merge all the CSS files into one), this CSS has the following characters prepended to it: ï»¿

PHP removes all whitespace, so a random ï»¿ in the middle of the code messes up the entire thing. As I mentioned, I can't actually see these characters when I open the file in gedit, so I can't remove them very easily.

I googled the problem, and there is clearly something wrong with the file encoding, which makes sense being as I've been shifting the files around to different Linux/Windows servers via ftp and rsync, with a range of text editors. I don't really know much about character encoding though, so help would be appreciated.

If it helps, the file is being saved in UTF-8 format, and gedit won't let me save it in ISO-8859-15 format (the document contains one or more characters that cannot be encoded using the specified character encoding). I tried saving it with Windows and Linux line endings, but neither helped.

phputf-8character-encodingbyte-order-markmojibake

This appears to solve the problem. 95isalive.com/expression/index.html – user928609

Somebody strip us off the BOM – David Heffernan

stackoverflow.com/q/20778921/995714 – phuclv

158

votes

Three words for you:

Byte Order Mark (BOM)

That's the representation for the UTF-8 BOM in ISO-8859-1. You have to tell your editor to not use BOMs or use a different editor to strip them out.

To automatize the BOM's removal you can use awk as shown in this question.

As another answer says, the best would be for PHP to actually interpret the BOM correctly, for that you can use mb_internal_encoding(), like this:

 <?php
   //Storing the previous encoding in case you have some other piece 
   //of code sensitive to encoding and counting on the default value.      
   $previous_encoding = mb_internal_encoding();

   //Set the encoding to UTF-8, so when reading files it ignores the BOM       
   mb_internal_encoding('UTF-8');

   //Process the CSS files...

   //Finally, return to the previous encoding
   mb_internal_encoding($previous_encoding);

   //Rest of the code...
  ?>

25

votes

Open your file in Notepad++. From the Encoding menu, select Convert to UTF-8 without BOM, save the file, replace the old file with this new file. And it will work, damn sure.

23

votes

In PHP, you can do the following to remove all non characters including the character in question.

$response = preg_replace('/[\x00-\x1F\x80-\xFF]/', '', $response);

20

votes

For those with shell access here is a little command to find all files with the BOM set in the public_html directory - be sure to change it to what your correct path on your server is

Code:

grep -rl $'\xEF\xBB\xBF' /home/username/public_html

and if you are comfortable with the vi editor, open the file in vi:

vi /path-to-file-name/file.php

And enter the command to remove the BOM:

set nobomb

Save the file:

wq

11

votes

BOM is just a sequence of characters ($EF $BB $BF for UTF-8), so just remove them using scripts or configure the editor so it's not added.

From Removing BOM from UTF-8:

#!/usr/bin/perl
@file=<>;
$file[0] =~ s/^\xEF\xBB\xBF//;
print(@file);

I am sure it translates to PHP easily.

6

votes

I don't know PHP, so I don't know if this is possible, but the best solution would be to read the file as UTF-8 rather than some other encoding. The BOM is actually a ZERO WIDTH NO BREAK SPACE. This is whitespace, so if the file were being read in the correct encoding (UTF-8), then the BOM would be interpreted as whitespace and it would be ignored in the resulting CSS file.

Also, another advantage of reading the file in the correct encoding is that you don't have to worry about characters being misinterpreted. Your editor is telling you that the code page you want to save it in won't do all the characters that you need. If PHP is then reading the file in the incorrect encoding, then it is very likely that other characters besides the BOM are being silently misinterpreted. Use UTF-8 everywhere, and these problems disappear.

6

votes

For me, this worked:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

If I remove this meta, the ï»¿ appears again. Hope this helps someone...

3

votes

You can use

vim -e -c 'argdo set fileencoding=utf-8|set encoding=utf-8| set nobomb| wq'

Replacing with awk seems to work, but it is not in place.

2

votes

grep -rl $'\xEF\xBB\xBF' * | xargs vim -e -c 'argdo set fileencoding=utf-8|set encoding=utf-8| set nobomb| wq'

2

votes

I had the same problem with the BOM appearing in some of my PHP files (ï»¿ï»¿).

If you use PhpStorm you can set at hotkey to remove it in Settings -> IDE Settings -> Keymap -> Main Menu - > File -> Remove BOM.

2

votes

In Notepad++, choose the "Encoding" menu, then "Encode in UTF-8 without BOM". Then save.

See Stack Overflow question How to make Notepad to save text in UTF-8 without BOM?.

2

votes

Open the PHP file under question, in Notepad++.

Click on Encoding at the top and change from "Encoding in UTF-8 without BOM" to just "Encoding in UTF-8". Save and overwrite the file on your server.

1

votes

Same problem, different solution.

One line in the PHP file was printing out XML headers (which use the same begin/end tags as PHP). Looks like the code within these tags set the encoding, and was executed within PHP which resulted in the strange characters. Either way here's the solution:

# Original
$xml_string = "&lt;?xml version=\"1.0\" encoding=\"UTF-8\"?&gt;";

# fixed
$xml_string = "<" . "?xml version=\"1.0\" encoding=\"UTF-8\"?" . ">";

1

votes

If you need to be able to remove the BOM from UTF-8 encoded files, you first need to get hold of an editor that is aware of them.

I personally use E Text Editor.

In the bottom right, there are options for character encoding, including the BOM tag. Load your file, deselect Byte Order Marker if it is selected, resave, and it should be done.

Alt text http://oth4.com/encoding.png

E is not free, but there is a free trial, and it is an excellent editor (limited TextMate compatibility).

1

votes

You can open it by PhpStorm and right-click on your file and click on Remove BOM...

1

votes

Here is another good solution for the problem with BOM. These are two VBScript (.vbs) scripts.

One for finding the BOM in a file and one for KILLING the damned BOM in the file. It works pretty fine and is easy to use.

Just create a .vbs file, and paste the following code in it.

You can use the VBScript script simply by dragging and dropping the suspicious file onto the .vbs file. It will tell you if there is a BOM or not.

' Heiko Jendreck - personal helpdesk & webdesign
' http://www.phw-jendreck.de
' 2010.05.10 Vers 1.0
'
' find_BOM.vbs
' ====================
' Kleines Hilfsmittel, welches das BOM finden soll
'
 Const UTF8_BOM = "ï»¿"
 Const UTF16BE_BOM = "þÿ"
 Const UTF16LE_BOM = "ÿþ"
 Const ForReading = 1
 Const ForWriting = 2
 Dim fso
 Set fso = WScript.CreateObject("Scripting.FileSystemObject")
 Dim f
 f = WScript.Arguments.Item(0)
 Dim t
 t = fso.OpenTextFile(f, ForReading).ReadAll
 If Left(t, 3) = UTF8_BOM Then
     MsgBox "UTF-8-BOM detected!"
 ElseIf Left(t, 2) = UTF16BE_BOM Then
     MsgBox "UTF-16-BOM (Big Endian) detected!"
 ElseIf Left(t, 2) = UTF16LE_BOM Then
     MsgBox "UTF-16-BOM (Little Endian) detected!"
 Else
     MsgBox "No BOM detected!"
 End If

If it tells you there is BOM, go and create the second .vbs file with the following code and drag the suspicios file onto the .vbs file.

' Heiko Jendreck - personal helpdesk & webdesign
' http://www.phw-jendreck.de
' 2010.05.10 Vers 1.0
'
' kill_BOM.vbs
' ====================
' Kleines Hilfmittel, welches das gefundene BOM löschen soll
'
Const UTF8_BOM = "ï»¿"
Const ForReading = 1
Const ForWriting = 2
Dim fso
Set fso = WScript.CreateObject("Scripting.FileSystemObject")
Dim f
f = WScript.Arguments.Item(0)
Dim t
t = fso.OpenTextFile(f, ForReading).ReadAll
If Left(t, 3) = UTF8_BOM Then
    fso.OpenTextFile(f, ForWriting).Write (Mid(t, 4))
    MsgBox "BOM gelöscht!"
Else
    MsgBox "Kein UTF-8-BOM vorhanden!"
End If

The code is from Heiko Jendreck.

1

votes

In PHPStorm, for multiple files and BOM not necessarily at the beginning of the file, you can search \x{FEFF} (Regular Expression) and replace with nothing.

0

votes

Same problem, but it only affected one file so I just created a blank file, copy/pasted the code from the original file to the new file, and then replaced the original file. Not fancy but it worked.

0

votes

Use Total Commander to search for all BOMed files:

Elegant way to search for UTF-8 files with BOM?

Open these files in some proper editor (that recognizes BOM) like Eclipse.
Change the file's encoding to ISO (right click, properties).
Cut ï»¿ from the beginning of the file, save
Change the file's encoding back to UTF-8

...and do not even think about using n...d again!

0

votes

I had the same problem. The problem was because one of my php files was in utf-8 (the most important, the configuaration file which is included in all php files).

In my case, I had 2 different solutions which worked for me :

First, I changed the Apache Configuration by using AddDefaultCharsetDirective in configuration files (or in .htaccess). This solution forces Apache to use the correct encodage.

AddDefaultCharset ISO-8859-1

The second solution was to change the bad encoding of the php file.

0

votes

Copy the text of your filename.css file.
Close your css file.
Rename it filename2.css to avoid a filename clash.
In MS Notepad or Wordpad, create a new file.
Paste the text into it.
Save it as filename.css, selecting UTF-8 from the encoding options.
Upload filename.css.

0

votes

This works for me!

def removeBOMs(fileName):
     BOMs = ['ï»¿',#Bytes as CP1252 characters
    'þÿ',
    'ÿþ',
    '^@^@þÿ',
    'ÿþ^@^@',
    '+/v',
    '÷dL',
    'Ýsfs',
    'Ýsfs',
    '^Nþÿ',
    'ûî(',
    '„1•3']
     inputFile = open(fileName, 'r')
     contents = inputFile.read()
     for BOM in BOMs:
         if not BOM in contents:#no BOM in the file...
             pass
         else:
             newContents = contents.replace(BOM,'', 1)
             newFile = open(fileName, 'w')
             newFile.write(newContents)
             return None

-3

votes

Check on your index.php, find "... charset=iso-8859-1" and replace it with "... charset=utf-8".

Maybe it'll work.

How do I remove ï»¿ from the beginning of a file?

23 Answers