34
votes

As clang-format is a tool to only reformat code, is it possible that such formatting can break working code or at least change how it works? Is there some kind of contract that it will/can not change how code works?

We have a lot of code that we want to format with clang-format. This means, many lines of code will change. Not having to review every single line of code that only changed due to a clang-format would be a big simplification of this process.

I would say that clang-format will not change how code works. On the other hand I am not 100% sure, if this can be guaranteed.

7
If clang-format broke valid code it'd be pretty useless, don't you think?EOF
@EOF, agreed, but can you guarantee that clang-format is bug-free? :)rici
Are the executables the same?Weather Vane
Also, I think @WeatherVane's question is the key, here. If you compile the original and the reformatted programs and the resulting executables are the same, then you can be confident that there were no semantic changes. Although personally I find it unlikely that you would ever find a bug of this nature in clang-format, if you are working in a context which requires extraordinary paranoia, that test would be simple enough to perform.rici
And if your project is a header-only library, or header files are shipped as well, then checking the binaries won't be enough, you also need good coverage tests... Which is why we test our software at various levels ;)Karoly Horvath

7 Answers

72
votes

Short answer: YES.


The clang-format tool has a -sort-includes option. Changing the order of #include directives can definitely change the behavior of existing code, and may break existing code.

Since the corresponding SortIncludes option is set to true by several of the built-in styles, it might not be obvious that clang-format is going to reorder your includes.

MyStruct.h:

struct MyStruct {
    uint8_t value;
};

original.c:

#include <stdint.h>
#include <stddef.h>
#include "MyStruct.h"

int main (int argc, char **argv) {
    struct MyStruct s = { 0 };
    return s.value;
}

Now let's say we run clang-format -style=llvm original.c > restyled.c.

restyled.c:

#include "MyStruct.h"
#include <stddef.h>
#include <stdint.h>

int main(int argc, char **argv) {
  struct MyStruct s = {0};
  return s.value;
}

Due to the reordering of the header files, I get the following error when compiling restyled.c:

In file included from restyled.c:1:
./MyStruct.h:2:5: error: unknown type name 'uint8_t'
    uint8_t value;
    ^
1 error generated.

However, this issue should be easy to work around. It's unlikely that you have order-dependent includes like this, but if you do, you can fix the problem by putting a blank line between groups of headers that require a specific order, since apparently clang-format only sorts groups of #include directives with no non-#include lines in between.

fixed-original.c:

#include <stdint.h>
#include <stddef.h>

#include "MyStruct.h"

int main (int argc, char **argv) {
    struct MyStruct s = { 0 };
    return s.value;
}

fixed-restyled.c:

#include <stddef.h>
#include <stdint.h>

#include "MyStruct.h"

int main(int argc, char **argv) {
  struct MyStruct s = {0};
  return s.value;
}

Note that stdint.h and stddef.h were still reordered since their includes are still "grouped", but that the new blank line prevented MyStruct.h from being moved before the standard library includes.


However...

If reordering your #include directives breaks your code, you should probably do one of the following anyway:

  1. Explicitly include the dependencies for each header in the header file. In my example, I'd need to include stdint.h in MyStruct.h.

  2. Add a comment line between the include groups that explicitly states the ordering dependency. Remember that any non-#include line should break up a group, so comment lines work as well. The comment line in the following code also prevents clang-format from including MyStruct.h before the standard library headers.

alternate-original.c:

#include <stdint.h>
#include <stddef.h>
// must come after stdint.h
#include "MyStruct.h"

int main (int argc, char **argv) {
    struct MyStruct s = { 0 };
    return s.value;
}
10
votes

For sure it can change how your code works. And the reason is C program can view some properties of its source code. What I'm thinking about is __LINE__ macro, but I'm not sure there are no other ways.

Consider 1.c:

#include <stdio.h>
int main(){printf("%d\n", __LINE__);}

Then:

> clang 1.c -o 1.exe & 1.exe
2

Now do some clang-format:

> clang-format -style=Chromium 1.c >2.c

And 2.c is:

#include <stdio.h>
int main() {
  printf("%d\n", __LINE__);
}

And, of course, output has changed:

> clang 2.c -o 2.exe & 2.exe
3
4
votes

Since clang-format affects only whitespace characters, you can check that files before and after clang-formating are identical up to whitespaces. In Linux/BSD/OS X you can use diff and tr for that:

$ diff --ignore-all-space <(tr '\n' ' ' < 2.c ) <(tr '\n' ' ' < 1.c)

1.c:

#include <stdio.h>
int main() {printf("Hello, world!\n"); return 0;}

2.c:

#include <stdio.h>
int main() {
    printf("Hello, world!\n");
    return 0;
}

Output of diff command is empty, meaning that files 1.c and 2.c are identical up to whitespaces.

As Karoly mentioned in his comment, note that in ideal conditions you still have to check spaces that matters, e.g. string literals. But in the real world I believe this test is more than enough.

3
votes

clang-format reformatted ASM code in a project because we effectively did this:

#define ASM _asm

ASM {

  ...

}
1
votes

yes

it will not break the working flow

the system has the config switch: "C_Cpp.clang_format_sortIncludes": false, but it not work, i don't know what is wrong...

my version is:ms-vscode.cpptools-0.13.1

this is my solution:

for the stable working flow ,use the grammar:

// clang-format off

...here is your code

// clang-format on

0
votes

In addition to the scenarios described in other answers, -sort-includes may break projects that use precompiled-headers. As indicated in gcc documentation:

A precompiled header cannot be used once the first C token is seen.

Same applies for Visual Studio.

In practice it usually means the precompiled header has to be the first included header file. If you let clang-format to sort them for you, then there is a very high chance your compilation will be broken when the precompiled header becomes not the first included file.


You can deal with it by naming your PCH file something like 0000000000000000000000_pch.h, but I think I've made my point about the issue with such ridiculous example 😉.

-5
votes

I imagine it would not, given that it is built on clang's static analysis, and therefore has knowledge of the structure of code itself, rather than just a dumb source code formatter that operates on the text alone(one of the boons of being able to use a compiler library). Given that the formatter uses the same parser and lexer as the compiler itself, I'd feel safe enough that it wouldn't have any issue spitting out code that behaves the same as what you feed it.

You can see the source code for the C++ formatter here: http://clang.llvm.org/doxygen/Format_8cpp_source.html