Gated checkin?

August 23, 2018 at 10:28 pm

Yesterday at the Eastside Agile “Lean Coffee” lunch, we were talking about how to choose what tests to run before a gated checkin, and I expressed the opinion that you shouldn’t run any tests.

That got an interesting response, and then an offline request, and since I’m not smart enough to fit my opinioninto Twitter’s character limit, I decided a blog post made sense…

The original question was, IIRC, “how do you decide what tests should be run as part of your gated checkin system”.

Many of you know how fond I am of heresy, because it’s the heretical answer that can push you in new and interesting directions. Heresy is how we got the #Nos… #NoBugs, #NoTDD, #NoEstimates, etc. And you also know that I’m allergic to process, especially big process. And you know Eric is going to talk about incentives because he cannot *shut up* about that topic…


So, this is what I’m thinking…

The problem with gated checkin systems is that they set up the wrong incentives. Let’s consider how it affects two developers, Perfect Paula and Sloppy Sammy.

Perfect Paula always runs appropriate tests before she checks in, so the gated checkin system never finds any issues with her work. That means it’s a waste of time for her to be waiting for the checkin to finish.

Sloppy Sammy does not have good habits; he forgets to run tests on his own box, forgets to integrate changes back into his branch, etc. He rarely gets through on the first try, so he has to make changes and re-submit.

There are two kinds of waste in this system; there is the queuing and context switch cost that Paula is paying – and that cost ironically reduces her productivity elsewhere. And there is the rework cost that Sammy is paying on an ongoing basis, which in my experience everybody in the team just ignores.

My assertion is that neither of those wastes are visible, and therefore there are no incentives driving towards better behavior. Paula just learns to deal with the tax she is paying, and Sammy stays sloppy.

The other issue is related to the original question; how do you decide what should be in the tests? The typical approach is to add in new tests whenever a failure wasn’t caught automatically by the existing set of tests, which can be fine with true unit tests, but can quickly become untenable with any other test types. That’s what gets you to two-hour waits for checking something in. None of these added tests really help Paula, but you *have to* have them because Sammy makes poor decisions in that area.

What is the alternative? Well, the alternative is to check in right away and then kick off the build and tests. If the build and tests fail, then there is a nice public email – because it’s the public build that failed – and you revert the checkin, and the offending party gets to try again. Or maybe you give them 10 minutes to fix their error (missing file?) before the revert happens.

There are downsides to this approach. The codebase will sometimes be in a bad state for a short period of time, and that means if somebody checks in crap, I sync to it, and then check in my stuff, my stuff is going to fail as well. And that will cause me more work. But my assertion is that that pain is *very* visible to everybody in the team and there are therefore amenable to process change. And I *want* developers thinking carefully about what they are doing before they checkin, because there are countless ways their code can be wrong that still passes the existing set of tests.


The typical pushback against doing this is that teams that try to do this end up with build and test breaks all the time. And they do. The reason they do is that the incentives in the organization are inherently pushing the developers towards the Sloppy Sammy persona. The #1 driver towards that is schedule pressure. And if you think gated checkin is the answer in that situation, what you are saying is, “the solution to our developers feeling like the don’t have enough time to do quality work is to put in a system that slows everybody down (especially our best developers) and reduces the downsides of doing crappy work”.

My experience with the teams that take this approach is that they generally manage to produce a mostly-healthy built after some effort, their overall pace is glacial, and their resulting products are buggy as hell.


Caveats: Pairing and mobbing might change how I feel. Deploying straight to production might change how I feel.