Not all Tests are Passing … is that so Bad?

For the Whiley compiler, I currently have over 500 end-end tests and in excess of 15,000 unit tests (most of which were auto-generated and target the type system).  Each end-end test is a short Whiley program that is  categorised as either valid or invalid.  Valid tests also include sample output and are expected to compile, execute and produce matching output.  The invalid tests are expected to generate a compile-time error.  To make this more precise, I categorise the kinds of error that the compiler can produce as: internal failure, syntax error or verification error.  Every invalid test is specified to raise either a syntax error or a verification error, and the test will not pass if the compiler does anything else.  Thus, an invalid test which should produce a syntax error will not pass if the compiler barfs out a NullPointerException.

Now, the thing is: not all my end-end tests are currently passing.  In fact, I don’t think it’s ever been the case that all of the tests have passed.  That may seem like a bad thing, but I think there are some mitigating circumstances:

  1. My test suite is constantly growing.  As soon as I spot an error, or think of a possible problem, I add a test.  Adding tests makes me feel good, and I love it!  That’s because a test is a piece of knowledge locked-in.  Once the test is written and checked in, I can’t forget.
  2. Some failures represent issues that I’ve given very low priority to.  Eventually, I will get to them … but not yet.  I do sometimes make use of the @Ignore annotation for these kinds of tests.
  3. Some failures represent significant design flaws in the system.  They should be high-priority, but represent weeks or months of refactoring and redesigning to fix.  I don’t like to @Ignore these ones … I prefer to feel the pain.

In a way, my test suite is like a queue.  My tests never all pass because by the time I fixed all the ones that are failing … I’ve added some more!

This is similar to [[Test-driven development]] I suppose.  What I don’t like about TDD though, is that you’re supposed to write a test and then immediately make it pass.  That doesn’t work for me since,  in a few seconds, I can write a test that might takes months of careful refactoring to solve.   I’m never going to fix those ones immediately … they need serious hammock time!

8 comments to Not all Tests are Passing … is that so Bad?

  • Daniel Yokomizo (@dyokomizo)

    The primary problem with this setup is ensuring the tests known to be failing are the ones failing in any given test run. Also sometimes the test stop failing due to unrelated reasons (which is also bad). It’s easy to add some annotation to a testing framework meaning “run this test and pass if it fails due to X but fail otherwise”, in JUnit we can use the expected attribute of @Test.

  • John M

    It’s much easier to see a new failure when there are no current failures – having regular failures can easily mask when break some. For ones not being fixed, disable the tests, record them against the action list/bug report and re-enable them when you get around to that area

  • Hi John,

    I definitely agree with that. But, I don’t like disabling tests because I forget that I disabled them. I like to keep reminding myself that I do need to address this issue. So, instead of disabling them, I keep a count of where I’m at: e.g. currently, 20 failing tests is my zero count. If I get any more than 20, I know something changed …

  • None

    Try partitioning. You have tests that you don’t expect to pass. Keep those tests separate from those tests that you do expect to pass. Then you get a clean regression library while still defining your expectations for the future.

  • Nigel Charman

    Have you considered having an executable language spec, containing the end-end tests as examples?

    Concordion would let you add some documentation around the tests, and supports setting the status of each spec to @Unimplemented, @ExpectedToFail or @ExpectedToPass. If marked as @Unimplemented or @ExpectedToFail, the JUnit result will be green, but the resultant spec will be highlighted in Grey on the index page.

    As a sample spec, see ExtensionConfiguration.html. Here’s the input specification, and the fixture code.

    This example doesn’t show the index page feature. If the idea appeals to you, give me a shout and I can point you in the right direction..

  • Hey Nigel,

    I think using a more serious testing framework would definitely be a step in the right direction. JUnit is seriously limited … I need something that keeps my entire test history for ever, lets me partition tests, identify what has changed between different test runs, etc.

    Yeah, I need to look at this kind more serious. Maybe when I see you again, we can talk about this in more detail …


  • Nigel Charman

    Actually, this particular fixture code is a bit scary, since it’s testing the effects of Concordion extension code on Concordion HTML (ie. using Concordion to test Concordion).

    It should be a bit simpler if you’re testing text output or compile-time errors 🙂

  • Yeah, text output for the valid tests, and compile-time errors for the invalid ones.

    It did look a bit scary, btw 😉

Leave a Reply




You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>