25
votes

Edit: git does not mess with character encoding. This is still here to share knowlege and avoid others making the same mistake.


The context: My enterprise uses an svn repository. I'm using git-svn as a client to interact with this repository. All text files in the project are (and must be) encoded with windows default encoding (cp-....). I use git-extensions, and sometimes the command line to pilot git.

What I did: During the last 3 days, I was working on a new feature, and I did a number of local commits. Finally i squashed all these commits into a single one using an interactive rebase, then i used git svn dcommit to push everything on the svn repository in a single commit.

What happened then: A collegue told me that all accents were messed up in the files that I modified, and in the new files after my commit. I had already commited text files with accents in the same repository with my installation of git + svn before, and it's the first time I face this issue.

My investigation:I did the following things to investigate: opened the files with notepad++, and tried the most current encodings (including windows default and UTF-8) to view them: none of them could display accents properly, and different accents are always rendered by the same sequence of strange glyphs.

The temporary workaround:I quickly created a revert commit with git extension and "dcommited" it.

The question:My enterprise svn repository is OK, but now i have the two following problems to solve:

  1. Understand what happened with the characters with accents
  2. Retrieve my work from the SVN history and commit it in a proper way (if possible without reviewing manually all the characters with accents)

Can anybody provide some clues (i'm rather new to git) ?

1
Do you mean that your text files contents was changed, not paths? (I ask because as I know git-svn works with files like with byte array). What version of git-svn do you use?Dmitry Pavlenko
Yeah, it's the content of the files which was changed during the operation, not the paths. I'm updating as soon as a new version comes, but i'm not at work right now. I'll tell you the exact version numbers of git and git extensions as soon as I canSamuel Rossille
When git-svn dcommits changes to the repository does the following:Dmitry Pavlenko
Sorry, enter here just posts the comment, I didn't know. Either interactive rebase has spoiled the files or git-svn. You may check by creating a temporary branch (git co <commit-id>; git co -b tmpbranch) for the commit which was the latest before you performed the interactive rebase (you may find old commits ids using "git reflog" command), and redo that interactive rebase under the same cicurstamces. After that have a look if your files are ok. Please, let me know if it is git-svn or rebase problem.Dmitry Pavlenko
Git doesn't destroy objects in operations, it just inserts new and updates references. It destroys them only in garbage collector call (though often it is called implicitly, by default it doesn't prune all unreachable objects). Git keeps all objects reachable from references and reflog. But even unreachable objects (by default) are not collected for about 30 days. Only if you called "git prune" or "git gc --prune" or sth like it explicitly.Dmitry Pavlenko

1 Answers

30
votes

And now let's reveal the painful truth (painful for my ego, not for git users): I did mess with the accents, not git.

I could have just removed the question which let's wrongly think that git can mess up with accents, but considering the number of upvotes, i think than a lot of people do the same mistake that i did, so I have chosen to answer my own question to establish the truth, and maybe help people in the same case:

  1. Git does not touch to characters other than line breaks.
  2. I broke the accents before commiting, and i did not noticed it because i did not pay enough attention. To do so, i edited some of the files with eclipse. Eclipse did not recognize the encoding and the accents were all replace by a weird byte sequence on save. That's all.

Thanks again to Dmitry Pavlenko for giving me indications on how to investigate this problem.

+1 to "git reflog"

Happy accent fixing ;=)