1
votes

I have read several other answers regarding how to export a table to .csv with UTF8 encoding (no BOM). I found code which almost works for me, see below.

My problem is that the table contains swedish characters (ÅÄÖ), and when the .csv-file is opened these are lost to what looks like an incorrect charset. I found a workaround which is to open the .csv-file in Notepad, save, and then open it in Excel. The workaround makes Excel display the letters properly, but I would prefer not to have the extra step. Can the code below be modified so that the charset is not lost?

Option Explicit

Sub CSVFileAsUTF8WithoutBOM()
Dim SrcRange As Range
Dim CurrRow As Range
Dim CurrCell As Range
Dim CurrTextStr As String
Dim ListSep As String
Dim FName As Variant
Dim UTFStream As Object
Dim BinaryStream As Object

' ADO Constants
Const adTypeBinary = 1 ' The stream contains binary data
Const adTypeText = 2 ' The stream contains text data (default)
Const adWriteLine = 1 ' write text string and a line separator (as defined by the LineSeparator property) to the stream.
Const adModeReadWrite = 3 ' Read/write
Const adLF = 10 ' Line feed only - default is carriage return line feed (adCRLF)
Const adSaveCreateOverWrite = 2 ' Overwrites the file with the data from the currently open Stream object, if the file already exists

' Open this workbook location
ChDrive Left(ThisWorkbook.Path, 1)
ChDir ThisWorkbook.Path

' ask for file name and path
  FName = Application.GetSaveAsFilename("", "CSV File (*.csv), *.csv")

' prepare UTF-8 stream
  Set UTFStream = CreateObject("adodb.stream")
  UTFStream.Type = adTypeText
  UTFStream.Mode = adModeReadWrite
  UTFStream.Charset = "UTF-8"
  UTFStream.LineSeparator = adLF
  UTFStream.Open

  'set field separator
  ListSep = ";"
  'set source range with data for csv file
  If Selection.Cells.Count > 1 Then
    Set SrcRange = Selection
  Else
    Set SrcRange = ActiveSheet.UsedRange
  End If

  For Each CurrRow In SrcRange.Rows
    CurrTextStr = ""
    For Each CurrCell In CurrRow.Cells
      CurrTextStr = CurrTextStr & Replace(CurrCell.Value, """", """""") & ListSep
    Next
    'remove ListSep after the last value in line
    While Right(CurrTextStr, 1) = ListSep
      CurrTextStr = Left(CurrTextStr, Len(CurrTextStr) - 1)
    Wend
    'add line to UTFStream
    UTFStream.WriteText CurrTextStr, adWriteLine ' Writes character data to a text Stream object
  Next

  'skip BOM
  UTFStream.Position = 3 ' sets or returns a long value that indicates the current position (in bytes) from the beginning of a Stream object

  'copy UTFStream to BinaryStream
  Set BinaryStream = CreateObject("adodb.stream")
  BinaryStream.Type = adTypeBinary
  BinaryStream.Mode = adModeReadWrite
  BinaryStream.Open ' Opens a Stream object

  'Strips BOM (first 3 bytes)
  UTFStream.CopyTo BinaryStream ' Copies a specified number of characters/bytes from one Stream object into another Stream object

  UTFStream.Flush ' Sends the contents of the Stream buffer to the associated underlying object
  UTFStream.Close ' Closes a Stream object

  'save to file
  BinaryStream.SaveToFile FName, adSaveCreateOverWrite
  BinaryStream.Flush ' Sends the contents of the Stream buffer to the associated underlying object
  BinaryStream.Close ' Closes a Stream object

End Sub
1
Did you mean Notepad++? Because saving the file with Notepad did nothing for me.DecimalTurn
No, on the contrary opening in Notepad++ does nothing for me while doing it in Windows Notepad fixes the file somehow.David Hasselberg
Based on your response to my answer, I feel like knowing the name of the other software you are trying to use the CSV with would be an important information to add to your question.DecimalTurn

1 Answers

1
votes

EDIT:

Based on your comment, I realize that what you initially wanted was to keep the information about the character encoding inside the file without having a BOM.

The problem with this question (as you realized it) is that the BOM is actually what normally contains the information about the character encoding and putting this information anywhere else in the file doesn't really make sense.

So, your code is actually perfect for the task at hand. What needs to be changed is how the CSV file is imported/opened by the software you want to use.

When the file has no BOM, a software reading the file has to guess the character encoding.

In general, if the software you use doesn't support BOM and doesn't guess correctly, there should at least be a way to customize the behavior of the import/open command so that you can specify the character encoding (seems like you actually found it).

Original answer:

For some reason, Excel has a hard time to guess the character encoding when opening a UTF-8 encoded CSV file when you just double-clicking the file. You have to help it a little...

Instead of opening it directly, you could load the CSV content to a new workbook by using the (legacy) Text Import Wizard and selecting the UTF-8 character set (65001) during import if Excel is not able to figure it out by itself.

If you were to record a macro while doing it and make it into a sub procedure, you could have something like this:

Sub OpenCSV(FullFileName As String)

    Dim wb As Workbook
    Set wb = Workbooks.Add
    
    Dim ws As Worksheet
    Set ws = wb.Sheets(1)

    With ws.QueryTables.Add(Connection:= _
        "TEXT;" & FullFileName, Destination:=Range( _
        "$A$1"))
        .Name = "CSV_Open"
        .FieldNames = True
        .RowNumbers = False
        .FillAdjacentFormulas = False
        .PreserveFormatting = True
        .RefreshOnFileOpen = False
        .RefreshStyle = xlInsertDeleteCells
        .SavePassword = False
        .SaveData = True
        .AdjustColumnWidth = True
        .RefreshPeriod = 0
        .TextFilePromptOnRefresh = False
        .TextFilePlatform = 65001
        .TextFileStartRow = 1
        .TextFileParseType = xlDelimited
        .TextFileTextQualifier = xlTextQualifierDoubleQuote
        .TextFileConsecutiveDelimiter = False
        .TextFileTabDelimiter = True
        .TextFileSemicolonDelimiter = False
        .TextFileCommaDelimiter = False
        .TextFileSpaceDelimiter = False
        .TextFileColumnDataTypes = Array(1)
        .TextFileTrailingMinusNumbers = True
        .Refresh BackgroundQuery:=False
    End With
    
End Sub

Other suggestion

If you really want to be able to double-click the file instead of using the Text Import Wizard or running a macro, you could always create a VBA event procedure in an add-in or PERSONAL.XSLB running every time a workbook is opened.

If it detects that the file that was just opened is a CSV file, it could close it and "reopen" it using the code above.

Extra: Of interest: there is a question here about how to change the default character encoding that Excel uses.