Gated checkin?

August 23, 2018 at 10:28 pm

Yesterday at the Eastside Agile “Lean Coffee” lunch, we were talking about how to choose what tests to run before a gated checkin, and I expressed the opinion that you shouldn’t run any tests.

That got an interesting response, and then an offline request, and since I’m not smart enough to fit my opinioninto Twitter’s character limit, I decided a blog post made sense…

The original question was, IIRC, “how do you decide what tests should be run as part of your gated checkin system”.

Many of you know how fond I am of heresy, because it’s the heretical answer that can push you in new and interesting directions. Heresy is how we got the #Nos… #NoBugs, #NoTDD, #NoEstimates, etc. And you also know that I’m allergic to process, especially big process. And you know Eric is going to talk about incentives because he cannot *shut up* about that topic…

 

So, this is what I’m thinking…

The problem with gated checkin systems is that they set up the wrong incentives. Let’s consider how it affects two developers, Perfect Paula and Sloppy Sammy.

Perfect Paula always runs appropriate tests before she checks in, so the gated checkin system never finds any issues with her work. That means it’s a waste of time for her to be waiting for the checkin to finish.

Sloppy Sammy does not have good habits; he forgets to run tests on his own box, forgets to integrate changes back into his branch, etc. He rarely gets through on the first try, so he has to make changes and re-submit.

There are two kinds of waste in this system; there is the queuing and context switch cost that Paula is paying – and that cost ironically reduces her productivity elsewhere. And there is the rework cost that Sammy is paying on an ongoing basis, which in my experience everybody in the team just ignores.

My assertion is that neither of those wastes are visible, and therefore there are no incentives driving towards better behavior. Paula just learns to deal with the tax she is paying, and Sammy stays sloppy.

The other issue is related to the original question; how do you decide what should be in the tests? The typical approach is to add in new tests whenever a failure wasn’t caught automatically by the existing set of tests, which can be fine with true unit tests, but can quickly become untenable with any other test types. That’s what gets you to two-hour waits for checking something in. None of these added tests really help Paula, but you *have to* have them because Sammy makes poor decisions in that area.

What is the alternative? Well, the alternative is to check in right away and then kick off the build and tests. If the build and tests fail, then there is a nice public email – because it’s the public build that failed – and you revert the checkin, and the offending party gets to try again. Or maybe you give them 10 minutes to fix their error (missing file?) before the revert happens.

There are downsides to this approach. The codebase will sometimes be in a bad state for a short period of time, and that means if somebody checks in crap, I sync to it, and then check in my stuff, my stuff is going to fail as well. And that will cause me more work. But my assertion is that that pain is *very* visible to everybody in the team and there are therefore amenable to process change. And I *want* developers thinking carefully about what they are doing before they checkin, because there are countless ways their code can be wrong that still passes the existing set of tests.

 

The typical pushback against doing this is that teams that try to do this end up with build and test breaks all the time. And they do. The reason they do is that the incentives in the organization are inherently pushing the developers towards the Sloppy Sammy persona. The #1 driver towards that is schedule pressure. And if you think gated checkin is the answer in that situation, what you are saying is, “the solution to our developers feeling like the don’t have enough time to do quality work is to put in a system that slows everybody down (especially our best developers) and reduces the downsides of doing crappy work”.

My experience with the teams that take this approach is that they generally manage to produce a mostly-healthy built after some effort, their overall pace is glacial, and their resulting products are buggy as hell.

 

Caveats: Pairing and mobbing might change how I feel. Deploying straight to production might change how I feel.

Pay Yourself First

November 7, 2017 at 3:31 pm

If you’ve ever spent time learning about saving for retirement, you may have come across a concept named “Pay Yourself First”.

Pay Yourself First is a reaction to a common financial problem; people base their spending on the amount of money that they have coming in each month, and therefore don’t save money for the future – to deal with emergencies that might come up, or to invest for retirement.

If you pay yourself first – remove the money you want to save from your world before you start looking at how much you can spend – you can establish a savings/investment habit.  It will cost you a little on an ongoing basis, but it won’t be long before your investments start spinning off free cash that can either be re-invested or used for other purposes. The $1000 that you saved will – over the long term – make you much more than that.

Why am I writing about investing?

Well, I’ve realized that this is exactly the same situation that many software teams get in.

Each month, they look at their income (developer time), and then they look at their expenses (bugs to fix, system maintenance, features they want to build), and allocate their income.

That’s the software equivalent of living paycheck-to-paycheck. And it pretty much ensures that you are going to be stuck with your current level of bugs, your current level of maintenance, and your current feature speed – at best. Realistically, these things are likely to degrade a bit as time passes.

If instead, you “pay yourself first”, your investment can help you reduce the amount of time you spend fixing bugs and make you faster at creating features. Do this on a consistent basis, and your investments will start spinning off free time that you can then reinvest to further improve your speed, or deployed to create more features than you could have before.

It’s not easy to set aside even $50 a week when you are starting, nor is it easy to set aside a developer-day on an ongoing basis. But once you’ve started doing this consistently, the returns will make it worthwhile.

What about your team? Are you paying yourself first?

Welcome

November 5, 2017 at 4:17 pm

Welcome to Eric’s Code Shack.

New posts will be coming as soon as I find some free time to write.

For the time being, I recommend viewing some classic posts from the past…

#NoTDD

June 22, 2017 at 3:31 pm

Did that title get your attention? Good.

Like the other #no<x> assertions – NoEstimates, NoBugs – I’m really not saying that you shouldn’t do TDD. Well, maybe I am…

I was an early TDD advocate, as I really liked the way it helped me organize my thoughts, and I respected some of the other people who were early advocates – people like Ron Jeffries.

But looking back on the 15 years since I started with TDD, I have to say that it really did not live up to my expectations.

What I’ve seen is a consistent pattern of TDD working in a laboratory setting – developers are quick to pick up the workflow and can create working code and tests during classes/exercises/katas – and then failing in the real world.

My hypothesis for this is very simple. When you look at the TDD evangelists, all of them share something: they are all very good – probably even great – at design and refactoring. They see issues in existing code and they know how to transform the code so it doesn’t have those issues, and specifically, they know how to separate concerns and reduce coupling.

If you have those skills, TDD works great; you see what the issues are as the arise and fix them incrementally, and your simple tests prove that your design has low coupling. And, you have the tests to lean on in the future.

A hypothetical

Let’s take the TDD workflow and remove the third step – the refactoring step. What would we expect to happen?

Well, we would expect to end up with classes that have multiple concerns in them – because we didn’t split them apart – and they would be inconvenient to test. We would need to write a lot of test code, and would need a lot of help to write the code, either creating many hand-written mocks or using mock libraries.

Which I submit is precisely the result that most developers get using TDD; the resulting code looks exactly like what we would expect if developers are skipping the third step. Which isn’t really very surprising, given that the tests in most non-TDD code look the same way, and we know that most developers do not have great design/refactoring skills.

At this point we should ask ourselves, “are these developers getting a net positive benefit from using TDD?” Let’s list the pros/cons:

Pros

  • We end up with tests that verify the behavior of the code and help prevent regressions
  •  

    Cons

     

  • It takes us longer to write code using TDD
  • The tests get in the way. Because my design does not have low coupling, I end up with tests that also do not have low coupling. This means that if I change the behavior of how class <x> works, I often have to fix tests for other classes.
  • Because I don’t have low coupling, I need to use mocks or other tests doubles often. Tests are good to the extent that the tests use the code in precisely the same way the real system uses the code. As soon as I introduce mocks, I now have a test that only works as long as that mock faithfully matches the behavior of the real system. If I have lots of mocks – and since I don’t have low coupling, I need lots of mocks – then I’m going to have cases where the behavior does not match. This will either show up as a broken test, or a missed regression.
  • Design on the fly is a learned skill. If you don’t have the refactoring skills to drive it, it is possible that the design you reach through TDD is going to be worse than if you spent 15 minutes doing up-front design.
  •  

    I think that’s a good summary of why many people have said, “TDD doesn’t work”; if you don’t have a design with low-coupling, you will end up with a bunch of tests that are expensive to maintain and may not guard against regression very well.

    And to be fair, the level of design and refactoring skills required is relative to the state of the codebase you’re working in. There are a lot of legacy codebases with truly monumental amounts of coupling in them and figuring out how to do anything safely is very difficult and complex.

    TDD—, Refactoring++

    Instead of spending time teaching people TDD, we should instead be spending time teaching them more about design and especially more about refactoring, because that is the important core skill. The ability to write and refactor code to a state with low coupling, well-separated concerns,  and great names will be applicable in all the codebases that we work in.

    That is not to say that I don’t dust off Ncrunch now and then to either create a new set of functionality with great tests or to create tests around a concern as I do a refactoring, because I still do that often. I just don’t think it’s the right place to start from a educational or evangelical perspective.

    Agile and the Theory of Constraints: Part 4–The Inner Loop

    May 11, 2017 at 3:32 pm

    In this post, I’m going to talk about what I’m calling the inner loop, what some people call “Ring Zero”; it is basically the simple act of writing code and validating it, over and over. It is focused on a single developer.

    Code/Test/Code/Test/Code/Test/Code/Test

    Before I dig into things, I have a bit of pre-work for you. I would like you to spend 5 minutes listing all the things that you (or your team) do/does as part of the inner loop, and then create a value stream map from that information. Put numbers on all of the steps, but don’t worry if they aren’t particularly accurate. You can also use a range like 5-10 minutes.

    Do you have a diagram that describes your current process? Great.

    Here’s the first diagram I came up with:

    code1

    The times are going to vary considerably for a given team and a given developer. Let’s expand a little bit on the diagram to add our first handoff. This time, because we’re only talking about a single developer, it’s a different kind of handoff.

    image

    When we start a build, we are handing off the code to another process –the compiler – and we will need to wait until that process is done. And no, that 5-60 minutes is not a misprint; I’ve worked on a number of teams where a full build was at least an hour.

    If you are thinking, “self, that looks like a queue/handoff and a context switch, and that is bad”. If so, give yourself a gold star and a hearty “Well done!”. Let’s flesh this out a bit more….

     

    image

    Before we can test or debug our code, we need to deploy it to a place where we can test or debug it. If we are running unit tests, we don’t have a deploy step, which is why it sometimes takes 0 minutes.

    Finally, sometimes we are successful, and sometimes we are not. That needs to be represented in the diagram as well…

    image

     

    If our build failed or our test/debug failed, we need to go back and tweak the code. If it worked, then we need to figure out the next thing to do.

    Time to search for bottlenecks. Where do we look?

    The usual places; we look for handoffs and we look for queues. Waiting for the build and waiting for the deployment are excellent candidates. Before we dig in there, I want to go on a bit of a tangent…

    In earlier posts, I talked in depth about developers switching to do something else and the costs of that. I’d like to quantify that using Eric’s Theory of Developer Wait Times.

    It sounds more grandiose than it really is…

     

    Wait time

    Impact

    None

    No impact

    <15 seconds

    Slight impact on flow

    60 seconds

    Some context lost. Annoying.

    2 minutes

    Most context lost; hard to keep focus

    > 2 minutes

    I’m doing something else

     

    This is basically just a repurposing/reexpression of the research around how users react to delays in programs.

    Once we get above 2 minutes, it’s time to read email, take a nature break, get some water, read the Facebooks, etc. Because it’s a hard interruption, it’s really easy for 3 minutes to turn into 15 minutes.

    Since developers spend a lot of time waiting for builds, the majority of the teams out there put a high priority on having fast builds; they give developers capable machines on which to do builds, they architect their projects so that components are small and builds can therefore be fast, and the focus on fast-executing unit tests rather than slower integration tests.

    Ha Ha. I make joke.

    Remember the discussion about how local optimization can de-optimize other parts of the system from the last post? Here’s another example.

    The amount of time I waste waiting for a build is determined by how long the build takes and how many times I need to wait for it. If my build takes 10 minutes, I will look to batch my work so that I minimize how much time I spend waiting for builds. That means bigger checkins, with all the downsides that come from them; mistakes are more common, design feedback is less likely to acted upon, etc.

    Is this obvious, or even noticeable? If you ask most dev leads about the impact of build speed on their team’s productivity, they will say something like, “Our build is slower than we would like, but it’s not a large drain on our productivity”. They say this whether their average build takes 30 seconds or their average build takes 15 minutes. That is based on an estimate of the “time spent waiting for builds” / “total time” metric. Developers will rearrange how they do things (ie “optimize”) so that the metric doesn’t get too far out of whack in most cases.

    What they are missing is the opportunity cost of having slow builds.

    • Slow builds means no TDD. I am fairly stubborn and I’ve tried to modify TDD to make it work when I have slow builds, but it isn’t really effective, and there is no way I can convince anybody else to try it.
    • Slow builds mean I (probably) can’t leverage any of the  continuous testing tools such as NCrunch or dotCover in the C# world or Infinitest in the Java world. Ncrunch is why there is a “none” entry in the wait time table; you just write your product code and your test code and everything happens in the background, with no wait for compile or deploy.
    • Slow builds mean my team will write fewer tests.
    • Slow builds mean my team will do fewer refactorings.

    In lean terms, slow builds are a constraint, and the developer optimization is an attempt to subordinate the constraint. Which is fine as long as you exploit and elevate the constraint to try to remove it as a constraint.

    If you can only do one thing, work on getting your developer builds faster.

    Having a short deployment cycle is also important, but if your builds are quick and you write unit tests, you will be spending a lot more time in that cycle and less time in the deployment cycle. For deployment I think that simple and automated is more important than pure speed, because mistakes you make in deployment kill your overall cycle time; just forgetting to deploy one changed binary can cost you anywhere from 15 minutes to a couple of hours. On the other hand, if you can make deployment very fast as well, nobody will be tempted to build a faster version of it, which has a lot of other benefits.

    Figure out what code to write

    We will now move the fill in the left side of the diagram, the part where we figure out what code to write.

    image 

    There are three ways we figure out what code to write:

    1. Sometimes we just think about it, or we already know pretty much what we want to do.
    2. Sometimes we have to do some research to figure out what we need to do and/or how to do it.
    3. Sometimes we ask somebody.

    Where are the bottlenecks here?

    One of them is obvious – it’s the time that we spend finding somebody to ask and/or the time we spend waiting for an email response to the a question. Unless it’s a person-to-person interaction, that part generally takes long enough that we “go do something else”.

    So, we should just try to limit that path, right?

    Well, nobody likes to interrupt other people to find out something you could have found out yourself, and in many (most?) teams there are social pressures that keep you from choosing that branch first, so they choose the upper research branch first.

    The flow in that branch typically looks something like:

    1. Need to do <x>
    2. Do a web search on <x>
    3. Read a few articles
    4. Go write some code
    5. Can’t figure out on of the samples.
    6. Go back and read some more
    7. Modify the code.
    8. Build/deploy/test the code
    9. Mostly works, but not sure
    10. Go back and read some more
    11. Try something else
    12. Code works, go do something else.

    Is that the most efficient way to do things? Or, to put it another way, if we wanted to make this faster, what would we do?

    Well, the obvious observation is that our build/test/deploy time has a big effect on the cycle time here; that was one of the reasons I talked about that first. If you can experiment quickly, you can be a bit less sure of what you are doing, spending less time researching and more time doing.

    But there’s a larger thing at play here.

    Our goal is obviously to get to the exit arrow as quickly as possible (assuming good testing, good quality code, etc. ). To do that, we have to write the code that is correct.

    So, to state it another way, our bottleneck is figuring out what the right code is. If we can do that quickly, then the coding and testing part is generally pretty straightforward.

    How do we become more efficient at that? How do we get rid of the waste?

    The answer is simple. The biggest waste is when we go down the wrong path; we don’t understand the code correctly, we don’t understand what we are trying to do, or we use a more cumbersome way of solving a problem. How do we solve that? Well, we need to apply more minds to the problem.

    Some groups have an “ask for help” rule, which says if you have been stuck on a problem for more than 15/30/60 minutes, you are required to ask for help. Which is a good idea, but you’ve already wasted a lot of time, nobody likes to ask for help, and you have to interrupt somebody else to ask, slowing them down.

    What we really need is a way to dedicate extra brainpower to a task from the start. By doing so, we will reduce false starts, research flailing, waiting too long to ask, etc. – and we will write the right code faster.

    And, once again, we’ve invented pairing.

    Pairing is more efficient because it directly targets the waste in the system:

    • Time spent researching how the system works
    • Time spent researching options
    • Time spent modifying code when it doesn’t work
    • Time spent doing more research when we could have just asked somebody.

    Why don’t teams try to become more efficient at writing good code?

    Well, that ties back to one of the weird parts of the profession.

    Writing software is a very complex task, especially if you are doing it in a codebase that has a lot of technical debt. Because of that, it is very hard to estimate how long a given task should take, which – transitively – means that it is often hard to determine whether a task is taking longer than it should. We don’t want to ask developers why a task is taking so long because that encourages the wrong behavior and we know that sometimes things just take a long time to do.

    Pairing gives us a way to hop inside that complex task and get rid of the waste without having to ask the wrong question or mess up the group dynamic. That is one of the reasons it is so powerful.

    That is all for this post, and that takes me to the end of where I envisioned the series. I do think there might be a short summary post to wrap things up.

     

     

    Agile and the Theory of Constraints – Part 3: The Development Team (3)

    April 18, 2017 at 3:08 pm

    Finally, we make our way to the heart of the development team and the design & code phase.

     

    That is the top part of this diagram.

    The design/review/finish and code/review/finish chunks are very similar. The developer does some work, submits it for review, perhaps does rework based on the review, and then finally finishes and moves onto the next step.

    As is common with an overall map, we’re some important details:3e_2

    In this detailed view, we see that the “Submit Review” step leads to a queue in front of three other developers, which then leads back to a queue for the original developer.

    Let’s explore a different perspective, switching from a value stream view to a sequence diagram. This diagram shows what each person is doing during a typical code review cycle:

    3f

    I tried to pick what feels like a reasonable example; of course there are some reviews that are simpler and some that are much more complicated.

    What we see here looks very nice from a utilization perspective; everyone is busy all the time, and the time from submission to finish for Miss Blue looks pretty good.

    This diagram is missing a bit of detail, so let’s add that in:

    3g

    What we were missing is what the manufacturing world calls “setup time”, which is the time it takes to switch a machine from doing one task to doing another. Obviously, this time is time that the machine is not doing useful work, and therefore reducing the time spent on setup is a major factor in optimizing throughput (article).

    In our world, we call this period a context switch. This isn’t a strict downtime, but there is a time of reduced efficiency, where developers are both less productive and more likely to make mistakes. How long it is depends upon the complexity of the mental model required; it might be only a couple of of minutes for a simple task, or it might be thirty minutes for a complex debugging scenario. Every time we switch from working on one thing to another, there is a loss of productivity and quality.

    Note that this is the diagram for one review; the actual sequence diagram for a developer can be much worse; there may be multiple reviews going on at the same time, and there are other interruptions as well; meetings, breaks, and the all important lunch.

    Looking at the diagram, the queues show up as lag time between when Miss Blue submits her code/design for review, and when the other developers start to review it. If we we can reduce that lag time – reduce the time spent in the queue – then we will improve the throughput of the system. So we  set up a policy that doing code reviews is our highest priority.

    image

    If you’ve made the choice in the picture, you know what happens; the process that you set to high priority runs faster, and you slow down the rest of the system. If you are lucky, you might even hang the system.

    The same thing happens in our process; when a code review shows up, it gets handled quickly, which would seem to be what we want. But it also shifts the required context switch from “when I have some free time” to “right now”, and that change makes the interrupt more costly. The teams I’ve been on that tried this absolutely hated it.

    Is there another solution? Well, we know from our earlier examples that getting rid of a queue is far better than just reducing its time – especially if there is a handoff, as we have here – so how can we get rid of a queue?

    Well, perhaps we can limit the number of developers who are required for a given code review, so that there is only one review queue per task. It would look something like this:

    image

    That is better from an interruption perspective, but we probably lose some quality, Mr. Red has more time and feels more responsible to do a good code review, but he doesn’t have any more context than he did before, and he’s still going to feel that the time he spends on code review is taken away from his “real work”.

    How can we improve that? What if he worked with Mrs. Blue during the design and coding, so that he understands *deeply* what is being done and can give immediate context. That would change the review process to be something like this:

    3h

    And yes, we have invented pairing. One of the benefit of pairing is obvious if you look at the sequence diagram and the long periods of work devoted to a single task; that is clearly going to have less waste than the previous diagram.

    The groups that I have been on that paired heavily ended up with a slightly different workflow; when the code is done, a code review is always sent out – so that others can see what is going on if they wish – and the pair has the option to ask for review from somebody else if they think they need it.

    This has worked very well in the groups that I’ve worked in.

     <aside – the hidden cost of context switches>

    If you are currently in a world like the one I described – one with lots of context switches – you might be saying to yourself, “self, I can see how it would be nice to have fewer interruptions, but I’m pretty good at context switching and still being productive.”

    Such a feeling is nearly universal, and almost always, it is wrong. When we are in a world that requires context switching – especially one where those context switches are higher priority – it’s hard not to be in a reactive, firefighting mode. What I sometimes call tactical mode.

    If you are in tactical mode, it’s very hard to engage your thoughts around bigger strategic questions, such as whether the approach that you are taking is an efficient one. Instead of “being productive” meaning “making good choices about how things are done so as to maximize output”, it becomes “responding quickly to interrupts and doing a decent job of juggling all the things on my plate”.

    Or, to put it another way, in a results-driven, context-switching environment, the chance that you are spending any time at all thinking about your efficiency is pretty low.

    To go back to my processor/operating system analogy, everybody knows what happens to background tasks when your CPU is at 100%, and strategic level thinking is a background task.

    I honestly think that if you can get rid of code review interruptions, that alone saves enough time to make pairing equal to single-developer work, without going to any of the other advantages.

    </aside>

    Checkin Queue

    Finally, we get to the checkin queue, where you submit code into a system that builds it runs all the tests, and then checks the code in if it was successful. This is often known as “gated checkin”.

    Let’s add in some of the missing detail.

    image

    When we submit our code, we jump into a queue where a separate machine will get the changes, make sure that everything builds, run the tests, and then either check the code in if it works or send an email to us telling us what the problems are.

    Let’s analyze the situation here…

    First, I see that there is a queue for the developer to wait for this process to happen. That will require them to figure out something else to do during this time period. The wait is going to depend on how big the queue before “build” is, and how long the build and test run is going to take.

    What effects does this approach have on the organization?

    1. Developers have to figure out something else to do while they wait, which involves a context switch and/or “looking busy”.
    2. Because of the time lag, there is possibility that somebody on your team will have their checkin finish before yours, and if their changes are incompatible with yours, you will have to wait.
    3. Because the wait feels like wasted time to the developers (it is…), they will try to optimize by making their submissions bigger, which makes their code reviews bigger and harder to understand. None of this is good.
    4. If the infrastructure breaks, we are dead in the water until it is fixed.
    5. The “run tests” step is a magnet for more tests; if bugs get through the box, it’s very tempting to add tests. That is *great* if they are unit tests, and horrible if they are integration tests, as the time to run the tests will get much slower.
    6. The attitude is “the system is supposed to find the errors”, which makes it easy to be sloppy and just submit things.
    7. Failures are generally private, and not considered to be that important.
    8. If there is a big failure that makes it though even though it shouldn’t have, you need to stop the whole system and block everybody until you get it fixed.

    The incentives typically push these systems into becoming bottlenecks, and they also lock the behavior of the team around the system; the team cannot experiment in this area because the system prevents it. My experience is that this can work okay with small codebases, but it rarely turns out good in larger codebases.

    Is there an alternative that gets rid of the bottleneck? Consider the following:

    image

    When the developer is finished, they check in their changes. There is a separate build/test machine that notices the changes, does the build, and runs the tests. If things are okay, it doesn’t do anything. If there is a failure, it emails the team (and in some teams, turns on a flashing red light in the team room).

    What are the differences?

    1. We have gotten rid of the queue, which saves time
    2. Developers can work in smaller chunks. There is great synergy with pairing and working in small increments.
    3. Failures are obvious, which puts social pressure on developers to be more careful.
    4. The system rewards the behavior you want to incentivize; it encourages developers to be a little more careful, it encourages them to run tests on their computers (and therefore encourages them to keep those tests simple and fast).
    5. It treats developers like adults who can make rational choices around what needs to be done before a checkin is made.

    The one downside of this system is that the build is sometimes broken. The teams I’ve worked on have adopted a “5 minute rule”; if you can fix the breakage in 5 minutes – say you just forgot to include a new file when you checked in – then you are allowed to do it. If it’s going to take longer than that, you revert your checkin so that nobody else is blocked.

    Which finally brings us to the end of the developer team section. Up next… the individual developer…

    Next: Part 4 – The Inner Loop

    Agile and the Theory of Constraints – Part 3: The Development Team (2)

    March 30, 2017 at 3:13 pm

    In the last post, I asked you to put some numbers on the red (aka “rework”) arrows on the diagram. There are two ways to express these numbers; either the percentage of the time a specific path is taken, or the number of times the path is taken on average. I’ve chosen the latter because it is easier to understand. As I noted in the last post, it doesn’t help us a lot to put numbers on the blue boxes because their times vary so drastically.

    3-2a

    In value stream maps, we want to measure the overall time through the system. We don’t have good times for the task boxes themselves, but it’s really worse than that; pretty much every system has ways to jump queues or even skip whole sections of the process, and the numbers are different if it’s the first time through or a rework.

    Does that mean this value stream map is useless?

    After all the time I’ve spent writing this, I am fervently hoping the answer is “no”. Let’s see what we can do with what we have…

    Searching for the bottleneck

    We normally start optimization by searching for the bottleneck. Here are a few characteristics of bottlenecks:

    1. They are always busy.
    2. Work piles up in front of them.
    3. Downstream resources are regularly idle.

    Let’s walk through these, and see how we might apply them.

    1. This one works well with machines, but people are another matter. Not only can we easily look busy, we might be busy with low-value work.
    2. Work piling up in front of them is a pretty good indicator. This is another way of saying, “examine the queues and find the ones that are the longest”.
    3. A machine that is after a bottleneck will look idle. As I noted in #1, humans are great at looking busy.

    It looks like we will be looking at the queues.

    3-2b

    Absent any real data on where our bottleneck is, I’m going to start with the “owner acceptance” part of the process. What has drawn me here are the rework numbers, which mean that we generally find one design issue and one bug when we hit owner acceptance, which will require us to rewind back into the process. That means the owner acceptance process time for a given feature is really the sum of:

    1. The time spent in the incoming queue.
    2. The time to perform the acceptance test.
    3. The time spent in the “redesign” and “fix bugs” queues (queues that are absent in the current map).
    4. The time for the redesigned and fixed feature to wind its way through the process again.
    5. The time spent in the incoming queue for the second time
    6. The time to perform the acceptance test.
    7. The time to repeat steps 3-6 if the feature still isn’t right.

    Not only are the rework issues slowing this feature down significantly, they are slowing all the other features down because we are using the same set of resources. And, they are significantly increasing the amount of work required in the owner acceptance step – we have to do most items twice – which is certainly going to make the acceptance testing itself slower.

    Another way of looking at this is that rework causes a multiplier effect based on how often it happens and how far back in the map it sends you. Acceptance is at the end and it can send you all the way back to the beginning, so it can have a very big impact.

    The solution here isn’t very surprising; we should work to a) reduce the size of the acceptance queue, so features don’t stall there, and b) reduce the chance that we need to do rework to address design issues and bugs.

    There are a few agile practices that we might use:

    • Embedding a product owner in the team
    • Having defined and regular interaction between the product owner and the developers
    • Doing feature design with the product owner and the whole team to make sure everybody is on the same page
    • Writing explicit acceptance criteria for features ahead of time.
    • Using ATDD with tools like Cucumber/Gherkin/Specflow.
    • The best solution is going to depend on your organization and how you structure development. I recommend focusing on a shared understanding of scenarios and acceptance criteria up front, as that will both prevent issues and streamline acceptance testing by the team.

      One more thought on this area; the situation here is an example of an handoff between teams. Handoffs are a great place to look for waste, because:

      • They aren’t under one team’s control, which means it would take cross-team effort – which most companies are poor at – to improve them
      • The incentives in place are usually on a per-team basis, and therefore they reinforce an “us vs. them” perspective.
      • Politics often show up.
      • There are often fixed contracts/SLAs

      In other words, it’s a lot of effort to try to fix them from within a team and the success rate isn’t great, so they persist. In lean terms, the teams on both sides will consider the handoff to be a constraint, and will subordinate (ie de-optimize) the rest of their process to try to deal with it as best they can.

      It is therefore very important to ask, “what would happen if this handoff did not exist?”

      Moving upstream

      3-2c

      Moving upstream, we next come to the test box. It has a lot in common with the owner acceptance scenario, but in thinking about it I realized that I’m missing some important detail in the drawing. For sake of discussion, I’ve collapsed the whole upper part of the diagram into a simple “Design & Code” box.

      3-2d

      Not only is there a queue before test, there are queues on the output of test. The first queue sits before the triage process, where we spend time talking about how bad each bug is and whether we want to expend time fixing it. This is typically done by managers and other senior (read as “expensive”, both in monetary terms and opportunity cost terms) people. I put “1-30 days” in front of that as a guess, but in my experience triage happens every week or every few weeks.

      After that, bugs move into another queue, which is typically called the “bug backlog”. The reason that the label says “1-N” is that the bug backlog is where many bugs go to die; unless you are fixing bugs faster than you are creating them, this list is going to continue to grow and you therefore will not fix all of the bugs.

      I missed another feature of the diagram:

      3-2e

      Because bugs come in faster than you fix them, the bug backlog will grow to be too big, so there is often a “retriage” step, where you look at the bugs you already triaged and get rid of the lower-priority bugs. This is an example of rework of the rework.

      What can we say about the effect of this on the time it takes a feature to flow through the system? If you agree that a bug is an indication that a feature has not met acceptance, then if you have bugs against that feature you are not done, and therefore most of your features are never done; you may have shipped them, but you are still engaged in rework on them after you ship them.

      How did we end up here? Well, this seems like a good time for a story that I call “FeatureLocks and the Three Groups”…

      FeatureLocks desperately wanted to go out into the world, but he was held up because he had bugs. He went and talked to the developers, and they said, “FeatureLocks, it’s not our fault, it’s the test team’s job to find bugs, and they are too slow at it”. Then he went and talked to the testers, and they said, “FeatureLocks, we are sorry there are bugs, but we are always busy testing and the developers are just writing too many bugs”. Then he went to the triage group, and they said, “FeatureLocks, we know that you want to go out into the world, but you have 10 priority 1 bugs logged against you and you’ll just have to wait until they are fixed”.

      FeatureLocks was so sad. His goal in life was to deliver value, but he couldn’t do that until he worked well and could visit customers.

      They did not live happily ever after.

      At this point, I feel compelled to talk a bit about charter.

      If we want to improve our development process, we need a set of goals. Something like:

      • Decreasing the time it takes features to move through the system
      • Decreasing the time it takes to fix bugs
      • Decreasing the number of bugs
      • Identifying more bugs earlier in the process
      • Reducing the average bug age

      Who owns the charter to make things better?

      Upper management certainly owns the overall charter, but they aren’t really close enough to the problem, so they delegate it to the individual team managers. That doesn’t work because each of the group only owns a part of the problem, and the handoff/interactions between groups aren’t really owned by anybody.

      Which means we really have two high-level problems; the first is the one we identified in the diagram – we have queues and lots of rework – and the second is that nobody owns the overall problem.

      You have probably figured out that this is a long discussion that will lead us towards what is sometime called “combined engineering”. We take the dev and test teams and we combine them into a larger “engineering” team. What do we get out of this?

      • Since we don’t have a handoff between teams, our queue between coding and testing will be smaller. More importantly, the way that process works is owned by one team, which means they can improve it.
      • It is in the new team’s best interest to make bugs easier to find and fix.

      My experience is that combined engineering is a significant improvement over individual teams.

      There is a natural extension to combining dev and test; the “devops” movement includes operations functions such as deployment into the same team, and has the same sort of benefits.

      The meta point – which I stole directly from the Theory of Constraints – is to always look at the interfaces between teams, because those are places where inefficiencies naturally accumulate.

      That pretty much covers the bottom part of the diagram; in my next post I’ll be tacking the design / code part.

      Part 3: The Development Team (3)

     

    Agile and the Theory of Constraints – Part 3: The Development Team (1)

    February 23, 2017 at 3:31 pm

    (Note: The first version of this was a very random draft rather than the first part that I wrote. I blame computer elves. This should be a bit more coherant)

    This episode of the series will focus on the development team – how a feature idea becomes a shippable feature.

    A few notes before we start:

    • I’m using “feature” because it’s more generic than terms like “story”, “MVP”, “MBI”, or “EBCDIC”
    • I picked an organizational structure that is fairly common, but it won’t be exactly like the one that you are using. I encourage you to draw your own value stream maps to understand how your world is different than the one that I show.

    In this episode, we will be looking at the overall development team. I’m going to start by looking at a typical development flow:

    3a

    Green lines are forward progress, red lines show that we have to loop back for rework.

    That’s the way it works for a single developer, and across a team it looks something like this:

    3b

    I’ve chosen to draw the case where features are assigned out by managers, but there are obviously other common choices. Hmm… there are already a ton of boxes in the diagram, and this is just the starting point, so I’m going to switch back to the single-developer view for now.

    What are we missing?

    Adding the Queues

    3c

    There are several queues in the process:

    1. The input queue, which I’m going to ignore for now.
    2. Design Review: After I have finished a design, I send it out to the rest of the team for review.
    3. Code Review: After I have finished the implementation, I send out the code to the team for review.
    4. Code Submission: I submit my code to an automated system that will run all the tests and check in if they all pass.
    5. Test: The feature moves to the test phase. This might be done by the development team, or there might be a separate test team.
    6. Acceptance: Somebody – typically a product owner – looks at the feature and determines if it is acceptable

    Now, let’s put some times on the time spend in the queue. The numbers I’m listing are from my experience for a decent traditional team, and they are typical numbers.

    1. Design Review: 3 hours to 1 day.
    2. Code Review: 3 hours to 1 day.
    3. Code Submission: 30 minutes to 1 day.
    4. Test: 1 day to 10 days
    5. Acceptance: 1 day to 10 days.

    Here’s an updated diagram with the numbers on it:

    3d

    At this point, we would typically try to put numbers on all of the blue boxes, but because our features sizes vary so much, the numbers were all over the place and weren’t very useful.

    We can, however, try to put some numbers on the red rework lines. I’d like you to think about what the numbers are in your organization, and we’ll pick it up from there.

    Part 3: The Development Team (2)

    Trip Report: Agile Open Northwest 2017

    February 14, 2017 at 4:46 pm

    Agile Open Northwest uses a different approach for running a conference. It is obviously around agile, and there is a theme – this year’s was “Why?” – but there is no defined agenda and no speakers lined up ahead of time. The attendees – about 350 this year – all show up, propose talks, and then put them on a schedule. This is what most of Thursday’s schedule looked like; there are 3 more meeting areas off to the right on another wall.

    imag0068

    I absolutely love this approach; the sessions lean heavily towards discussion rather than lecture and those discussions are universally great. And if you don’t like a session, you are encouraged/required to stand up and go find something better to do with your time.

    There are too many sessions and side conversations that go on for me to summarize them all, but I’ve chosen to talk about four of them, two of mine, and two others. Any or all of these may become full blog posts in the future.

    TDD and refactoring: Did we choose the wrong practice?

    Hosted by Arlo.

    The title of this talk made me very happy, because something nearly identical lived on my topic sheet, and I thought that Arlo would probably do a better job than I would.

    The basic premise is simple. TDD is about writing unit tests, and having unit tests is a great way to detect bugs after you have created them, but it makes more sense to focus on the factors that cause creation of bugs, because once bugs are created, it’s too late. And – since you can’t write good unit tests in code that is unreadable – you need to be able to do the refactoring before you can do the unit testing/TDD anyway.

    Arlo’s taxonomy of bug sources:

    • Unreadable code.
    • Context dependent code – code that does not have fixed behavior but depends on things elsewhere in the system
    • Communication issues between humans. A big one here is the lack of a ubiquitous single language between customers to code; he cited several examples where the name of a feature in the code and the customer visible name are different, along with a number of other issues.

    I think that the basic problem with TDD is that you need advanced refactoring and design skills to deal with a lot of legacy code to make it testable – I like to call this code “aggressively untestable” – and unless you have those skills, TDD just doesn’t work. I also think that you need these skills to make TDD work well even with new code – since most people doing TDD don’t refactor much – but it’s just not evident because you still get code that works out of the process.

    Arlo and I talked about the overall topic a bit more offline, and I’m pleased to be in alignment with him on the importance of refactoring over TDD.

    Continuous Improvement: Why should I care?

    I hosted and facilitated this session.

    I’m interested in how teams get from whatever their current state to their goal state – which I would loosely define as “very low bug rate, quick cycle time, few infrastructure problems”. I’ve noticed that, in many teams, there are a few people who are working to get to a better state – working on things that aren’t feature work – but it isn’t a widespread thing for the group, and I wanted to have a discussion about what is going on from the people side of things.

    The discussion started by talking about some of the factors behind why people didn’t care:

    • We’ve always done it this way
    • I’m busy
    • That won’t get me promoted
    • That’s not my job

    There was a long list that were in this vein, and it was a bit depressing. We talked for a while about techniques for getting around the issue, and there was some good stuff; doing experiments, making things safe for team members, that sort of thing.

    Then, the group realized that the majority of the items in our list were about blaming the issue on the developers – it assumed that, if only there wasn’t something wrong with them, they would be doing “the right thing”.

    Then somebody – and of course it was Arlo – gave us a new perspective. His suggestion was to ask the developers, “When you have tried to make improvements in the past, how has the system failed you, and what makes you think it will fail you in the future?”

    The reality is that the majority of developers see the inefficiencies in the system and the accumulated technical debt and they want to make things better, but they don’t. So, instead of blaming the developers and trying to fix them, we should figure out what the systemic issues are and deal with those.

    Demo your improvement

    Hosted by Arlo.

    Arlo’s sessions are always well attended because he always comes up with something interesting. This session was a great follow-on to the continuous improvement session that I hosted.

    Arlo’s basic thesis for this talk is that improvements don’t get done because they are not part of the same process as features and are not visibly valued as features.

    For many groups, the improvements that come out of retros are either stuck in retro notes or they show up on the side of a kanban board. They don’t play in the “what do I pick up next” discussion, and therefore nothing gets done, and then people stop coming up with ideas it seems pointless. His recommendation is to establish second section (aka “rail”) on your kanban board, and budget a specific amount to that rail. Based on discussions with many managers, he suggested 30% as a reasonable budget, with up to 50% if there is lots of technical and/or process debt on the team.

    But having a separate section on the kanban is not sufficient to get people doing the improvements, because they are still viewed as second-class citizens when compared to features. The fix for that is to demo the improvements the same way that features are demo’d; this puts them on an equal footing from a organizational visibility perspective, and makes their value evident to the team and to the stakeholders.

    This is really a great idea.

    Meta-Refactoring (aka “Code Movements”)

    Hosted by me.

    In watching a lot of developers, use refactoring tools, I see a lot of usage of rename and extract method, and the much less usage of the others. I’ve have been spending some time challenging myself to do as much refactoring automatically with Resharper – and by that, I mean that I don’t type any code into the editing window – and I’ve developed a few of what I’m thinking of as “meta-refactorings” – a series of individual refactorings that are chained together to achieve a specific purpose.

    After I described my session, to friend and ex-MSFT Jay Bazuzi, he said that they were calling those “Code Movements”, presumably by analogy to musical movements, so I’m using both terms.

    I showed a few of the movements that I had been using. I can state quite firmly that flipchart is really the worst way to do this sort of talk; if I do it again I’ll do it in code, but we managed to make it work, though I’m quite sure the notes were not intelligible.

    We worked through moving code into and out of a method (done with extract method and inlining, with a little renaming thrown in for flavor). And then we did a longer example, which was about pulling a bit of system code out of a class and putting it in an abstraction to make a class testable. That takes about 8 different refactorings, which go something like this:

    1. Extract system code into a separate method
    2. Make that method static
    3. Move that method to a new class
    4. Change the signature of the method to take a parameter of it’s own type.
    5. Make the method non-static
    6. Select the constructor for the new class in the caller, and make it a parameter
    7. Introduce an interface in the new class
    8. Modify the new class parameter to use the base type (interface in this case).

    Resharper is consistently correct in doing all of these, which means that they are refactorings in the true sense of the word – they preserve the behavior of the system – and are therefore safe to do even if you don’t have unit tests.

    They are also *way* faster than trying to do that by hand; if you are used to this movement, you can do the whole thing in a couple of minutes.

    I asked around and didn’t find anybody who knew of a catalog for these, so my plan is to start one and do a few videos that show the movements in action. I’m sure there are others who have these, and I would very much like to leverage what they have done.

     

    Stop writing bad tests. Write only the tests that you can do great.

    October 9, 2016 at 8:25 pm

    I’ve been working on a talk on ways to make unit testing easier. I has not been going well; I’d come up with an approach I liked, do most of the slides for it, come back to it, and be unhappy with what I had written.

    This happened – and I am not exaggerating – 4 times in a row.

    In the 5th try, as I was working through the techniques I was going to talk about, I realized something. But let me back up a bit first…

    Pretty much every introduction for unit testing starts with a very simple scenario using a very simple class; the flow is something like:

    1. Figure out what a method does
    2. Write a test for it
    3. Repeat

    Or, if you are doing TDD, you swap the order and write the test before the method.

    With a small class and small problem space, this works well; it’s simple for developers to understand, and you can therefore present it to a group and they walk out thinking that they understand unit testing.

    There is one *tiny* problem, however.

    It doesn’t work on real classes – and by that, I mean the classes that are in most real system. We all know that the testability of existing codebases is low, and we also know that most developers do not have the design skills or experience to take that sort of code and write good unit tests for it.

    But that is what developers try to do, because THAT IS WHAT OUR INTRODUCTION TOLD THEM UNIT TESTING IS ABOUT.

    So, they take their existing code, add in a bunch of interfaces so they can inject dependencies, pull in their favorite mocking library, shake it around a bit, and end up with a unit test.

    Hmm…

    • There is a lot of test code that takes time to write
    • Because of the high coupling, changes in many areas of the system will require changes in the test
    • The test is very hard to understand, and it’s often not clear whether the test is actually testing what it says it is testing
    • Developers do not feel like they are being successful in writing unit tests.

    And – AND – there is very little chance of the test driving improvements, which is one of the main reasons we are advocating for a unit-testing approach.

    We have been going about this in the wrong way.

    We should focus on teaching developers how to look at code and figure out what different units are lurking in a single class, and also teaching them how to extract those units out so that simple tests can be written for them.

    And… we should lighten up on the “you should write tests for everything”, because these expensive complex tests aren’t doing anybody any good.