c++ - How should I unit test a code-generator?

votes

This is a difficult and open-ended question I know, but I thought I'd throw it to the floor and see if anyone had any interesting suggestions.

I have developed a code-generator that takes our python interface to our C++ code (generated via SWIG) and generates code needed to expose this as WebServices. When I developed this code I did it using TDD, but I've found my tests to be brittle as hell. Because each test essentially wanted to verify that for a given bit of input code (which happens to be a C++ header) I'd get a given bit of outputted code I wrote a small engine that reads test definitions from XML input files and generates test cases from these expectations.

The problem is I dread going in to modify the code at all. That and the fact that the unit tests themselves are a: complex, and b: brittle.

So I'm trying to think of alternative approaches to this problem, and it strikes me I'm perhaps tackling it the wrong way. Maybe I need to focus more on the outcome, IE: does the code I generate actually run and do what I want it to, rather than, does the code look the way I want it to.

Has anyone got any experiences of something similar to this they would care to share?

c++pythonunit-testingcode-generationswig

I'm actually facing this same problem, and none of the below answers really are satisfactory. Granted, you can unit test the pieces of a code generator. The problem is how do you know the generated code is correct, i.e., that there are no regressions or anything like that, and therefore how do you write automated tests for generated code (whether they're called unit or integration tests)? – James Kingsbery

@James: there's no easy answer is there...I've just re-read this question and the responses and all the issues I had at the time come flooding back. I may give this another shot in the coming weeks because I'm ending up with various regressions from time-to-time and it's getting more and more critical to detect these. – jkp

Its a big massive string comparison. Might be easier using an AST – Nikos

8 Answers

votes

I started writing up a summary of my experience with my own code generator, then went back and re-read your question and found you had already touched upon the same issues yourself, focus on the execution results instead of the code layout/look.

Problem is, this is hard to test, the generated code might not be suited to actually run in the environment of the unit test system, and how do you encode the expected results?

I've found that you need to break down the code generator into smaller pieces and unit test those. Unit testing a full code generator is more like integration testing than unit testing if you ask me.

votes

Recall that "unit testing" is only one kind of testing. You should be able to unit test the internal pieces of your code generator. What you're really looking at here is system level testing (a.k.a. regression testing). It's not just semantics... there are different mindsets, approaches, expectations, etc. It's certainly more work, but you probably need to bite the bullet and set up an end-to-end regression test suite: fixed C++ files -> SWIG interfaces -> python modules -> known output. You really want to check the known input (fixed C++ code) against expected output (what comes out of the final Python program). Checking the code generator results directly would be like diffing object files...

votes

Yes, results are the ONLY thing that matters. The real chore is writing a framework that allows your generated code to run independently... spend your time there.

votes

If you are running on *nux you might consider dumping the unittest framework in favor of a bash script or makefile. on windows you might consider building a shell app/function that runs the generator and then uses the code (as another process) and unittest that.

A third option would be to generate the code and then build an app from it that includes nothing but a unittest. Again you would need a shell script or whatnot to run this for each input. As to how to encode the expected behavior, it occurs to me that it could be done in much the same way as you would for the C++ code just using the generated interface rather than the C++ one.

votes

Just wanted to point out that you can still achieve fine-grained testing while verifying the results: you can test individual chunks of code by nesting them inside some setup and verification code:

int x = 0;
GENERATED_CODE
assert(x == 100);

Provided you have your generated code assembled from smaller chunks, and the chunks do not change frequently, you can exercise more conditions and test a little better, and hopefully avoid having all your tests break when you change specifics of one chunk.

votes

Unit testing is just that testing a specific unit. So if you are writing a specification for class A, it is ideal if class A does not have the real concrete versions of class B and C.

Ok I noticed afterwards the tag for this question includes C++ / Python, but the principles are the same:

    public class A : InterfaceA 
    {   
      InterfaceB b;

      InterfaceC c;

      public A(InterfaceB b, InterfaceC c)   {
          this._b = b;
          this._c = c;   }

      public string SomeOperation(string input)   
      {
          return this._b.SomeOtherOperation(input) 
               + this._c.EvenAnotherOperation(input); 
      } 
    }

Because the above System A injects interfaces to systems B and C, you can unit test just system A, without having real functionality being executed by any other system. This is unit testing.

Here is a clever manner for approaching a System from creation to completion, with a different When specification for each piece of behaviour:

public class When_system_A_has_some_operation_called_with_valid_input : SystemASpecification
{
    private string _actualString;

    private string _expectedString;

    private string _input;

    private string _returnB;

    private string _returnC;

    [It]
    public void Should_return_the_expected_string()
    {
        _actualString.Should().Be.EqualTo(this._expectedString);
    }

    public override void GivenThat()
    {
        var randomGenerator = new RandomGenerator();
        this._input = randomGenerator.Generate<string>();
        this._returnB = randomGenerator.Generate<string>();
        this._returnC = randomGenerator.Generate<string>();

        Dep<InterfaceB>().Stub(b => b.SomeOtherOperation(_input))
                         .Return(this._returnB);
        Dep<InterfaceC>().Stub(c => c.EvenAnotherOperation(_input))
                         .Return(this._returnC);

        this._expectedString = this._returnB + this._returnC;
    }

    public override void WhenIRun()
    {
        this._actualString = Sut.SomeOperation(this._input);
    }
}

So in conclusion, a single unit / specification can have multiple behaviours, and the specification grows as you develop the unit / system; and if your system under test depends on other concrete systems within it, watch out.

votes

My recommendation would be to figure out a set of known input-output results, such as some simpler cases that you already have in place, and unit test the code that is produced. It's entirely possible that as you change the generator that the exact string that is produced may be slightly different... but what you really care is whether it is interpreted in the same way. Thus, if you test the results as you would test that code if it were your feature, you will find out if it succeeds in the ways you want.

Basically, what you really want to know is whether your generator will produce what you expect without physically testing every possible combination (also: impossible). By ensuring that your generator is consistent in the ways you expect, you can feel better that the generator will succeed in ever-more-complex situations.

In this way, you can also build up a suite of regression tests (unit tests that need to keep working correctly). This will help you make sure that changes to your generator aren't breaking other forms of code. When you encounter a bug that your unit tests didn't catch, you may want to include it to prevent similar breakage.

votes

I find that you need to test what you're generating more than how you generate it.

In my case, the program generates many types of code (C#, HTML, SCSS, JS, etc.) that compile into a web application. The best way I've found to reduce regression bugs overall is to test the web application itself, not by testing the generator.

Don't get me wrong, there are still unit tests checking out some of the generator code, but our biggest bang for our buck has been UI tests on the generated app itself.

Since we're generating it we also generate a nice abstraction in JS we can use to programatically test the app. We followed some ideas outlined here: http://code.tutsplus.com/articles/maintainable-automated-ui-tests--net-35089

The great part is that it really tests your system end-to-end, from code generation out to what you're actually generating. Once a test fails, its easy to track it back to where the generator broke.

It's pretty sweet.

Good luck!