I have a Rails application that accepts file uploads of CSV files. When developing the feature locally on my Mac, I received an "invalid byte sequence in UTF-8" error when trying to parse the uploaded file (using Ruby's standard library CSV).
So after doing some research and reading some answers to similar questions on StackOverflow, I tried using a gem to sniff out the character encoding (namely CharDet), and then when opening the file via the CSV library, I would specify the encoding. And this solved all my problems, and life was good.
content = File.read(fullpath)
self.file_encoding = CharDet.detect(content)['encoding']
CSV.table(fullpath, :encoding => file_encoding, :header_converters => :downcase).headers
But then I deployed this code to the production Linux environment, and again with the "invalid byte sequence in UTF-8" errors. What a mystery (to me anyway)! After quite some time trying to resolve the error, I tried removing the code that specified the encoding upon opening the file. And miraculously it fixed the problem on production, but now local Mac development is broken.
Keep in mind, that in both cases I'm uploading the same file using the same browser. Does anyone have any insight on what is going on here?
By the way, versions of ruby are close, but not the same. The Mac is ruby 1.9.3-p0, and the Linux server is 1.9.2-p180. The app is Rails 3.2.6.