June 8, 2019 at 2:16 am

Back in my early days at Microsoft – the late 1990s – groups ran on the PUM (Product Unit Manager) model. I was on the Visual C++ team, and our PUM was in charge of everything related to that team; he had reporting to him the heads of the three different disciplines at the time; a Dev manager, a test manager, and a group program manager (at some point it was decided that “program manager manager” or “program manager squared” weren’t great job titles).

Becoming a PUM was a high bar; you had to understand all three disciplines in detail, and you had to work well at the “product vision level”.

Technically, there were only PUMs if your team shipped a product; in other cases you had a GM (General Manager IIRC) with the same responsibilities.

This worked quite well; as a member of a team you knew who had the final authority and that person had a single thing to focus on. The buck stopped with that one person.

But there was a “problem”. The fact that there were limited PUM positions and the skills bar was high meant that many dev/test/pm managers weren’t able to move up to PUM positions.

That was a *good* thing; having interacted with a lot of discipline managers, most of them didn’t have the chops to be PUMs in my opinion.

But what is meant was that people were leaving to go to other companies. Some of that might of been Microsoft’s dogged insistence to only pay at the 50th percentile, but it was decided – in Office if my information is correct – that something needed to be done.

So, the triad was invented. Essentially the disciplines were extended up higher into the management hierarchy, with them only meeting when you to the VP level. That was 2 or 3 levels up depending on how you look at things.

This is obviously a really bad idea from an effectiveness perspective.

With the PUM model, everybody on a team – whatever the discipline – was working towards a coherent charter on what was important – a vision defined by the PUM.

Get rid of the PUM, and now there are three sets of incentive structures at play. The dev manager is trying to do whatever the dev manager above him thinks is important – which is generally something like “ship features”. The GPM is trying to do what the PM manager above wants – something like “come up with flash features that make us look good”. And the test manager is generally just treading water to try to keep the product from not sucking at epic levels.

In the PUM model, those three had to cooperate to achieve what the PUM wanted, but that incentive went away, and what you typically saw was the managers just dividing the overall responsibilities and not really working on the bigger whole. That was the *best case*; the cases where the managers actively did not like each other were far less functional.

One of the triad teams that I was on, the dev manager had the most political capital, so all the devs got separate office at the good end of the building. The QA (sorry, “quality” at this point, since that group didn’t write tests) and PM orgs were at the other end of the building in open space. On a different floor.

And if you want to change something about how the disciplines worked together – like maybe put a dev team and the associated quality team near each other physically?

Isn’t going to happen. The dev manager wants to keep his space because it’s a power thing, and the only person that can mediate this disagreement is at least 2 levels up. None of the managers in between are going to support you even bringing it up with the VP because it would make them look bad, not that your VP is going to care at all about what is going on at that level; the feedback will be “just go figure it out”.

The separation existed at the budget level as well. So, devs get better machines than the quality team because their hierarchy had more power at budget time. You can’t even do morale events together because you have different morale budgets and the higher level managers want to allocate them differently.

As I said, it is a very bad idea from the perspective of effectiveness. If you looked at the skill levels of the higher-level managers, it was pretty pathetic. I went to an all discipline meeting once in around 2007 where the high-level manager was talking about this new thing that he found out that was going to revolutionize management.

Dashboards. It was dashboards. I figured that nobody in software management could get to 2007 without knowing about dashboards, but apparently I was mistaken.

So, why was it done and why did it continue so long?

Triads were great at one thing, and that was getting managers promoted up to partner levels. It’s great; there are 3x as many upper management positions, you need fewer skills, and – most importantly – there is *shared responsibility* in every area, so nothing is every your fault.

Managers are not playing the same game as ICs. They are playing the “get promoted, get powerful, make partner, make lots of money” game.

Gated checkin?

August 23, 2018 at 10:28 pm

Yesterday at the Eastside Agile “Lean Coffee” lunch, we were talking about how to choose what tests to run before a gated checkin, and I expressed the opinion that you shouldn’t run any tests.

That got an interesting response, and then an offline request, and since I’m not smart enough to fit my opinioninto Twitter’s character limit, I decided a blog post made sense…

The original question was, IIRC, “how do you decide what tests should be run as part of your gated checkin system”.

Many of you know how fond I am of heresy, because it’s the heretical answer that can push you in new and interesting directions. Heresy is how we got the #Nos… #NoBugs, #NoTDD, #NoEstimates, etc. And you also know that I’m allergic to process, especially big process. And you know Eric is going to talk about incentives because he cannot *shut up* about that topic…


So, this is what I’m thinking…

The problem with gated checkin systems is that they set up the wrong incentives. Let’s consider how it affects two developers, Perfect Paula and Sloppy Sammy.

Perfect Paula always runs appropriate tests before she checks in, so the gated checkin system never finds any issues with her work. That means it’s a waste of time for her to be waiting for the checkin to finish.

Sloppy Sammy does not have good habits; he forgets to run tests on his own box, forgets to integrate changes back into his branch, etc. He rarely gets through on the first try, so he has to make changes and re-submit.

There are two kinds of waste in this system; there is the queuing and context switch cost that Paula is paying – and that cost ironically reduces her productivity elsewhere. And there is the rework cost that Sammy is paying on an ongoing basis, which in my experience everybody in the team just ignores.

My assertion is that neither of those wastes are visible, and therefore there are no incentives driving towards better behavior. Paula just learns to deal with the tax she is paying, and Sammy stays sloppy.

The other issue is related to the original question; how do you decide what should be in the tests? The typical approach is to add in new tests whenever a failure wasn’t caught automatically by the existing set of tests, which can be fine with true unit tests, but can quickly become untenable with any other test types. That’s what gets you to two-hour waits for checking something in. None of these added tests really help Paula, but you *have to* have them because Sammy makes poor decisions in that area.

What is the alternative? Well, the alternative is to check in right away and then kick off the build and tests. If the build and tests fail, then there is a nice public email – because it’s the public build that failed – and you revert the checkin, and the offending party gets to try again. Or maybe you give them 10 minutes to fix their error (missing file?) before the revert happens.

There are downsides to this approach. The codebase will sometimes be in a bad state for a short period of time, and that means if somebody checks in crap, I sync to it, and then check in my stuff, my stuff is going to fail as well. And that will cause me more work. But my assertion is that that pain is *very* visible to everybody in the team and there are therefore amenable to process change. And I *want* developers thinking carefully about what they are doing before they checkin, because there are countless ways their code can be wrong that still passes the existing set of tests.


The typical pushback against doing this is that teams that try to do this end up with build and test breaks all the time. And they do. The reason they do is that the incentives in the organization are inherently pushing the developers towards the Sloppy Sammy persona. The #1 driver towards that is schedule pressure. And if you think gated checkin is the answer in that situation, what you are saying is, “the solution to our developers feeling like the don’t have enough time to do quality work is to put in a system that slows everybody down (especially our best developers) and reduces the downsides of doing crappy work”.

My experience with the teams that take this approach is that they generally manage to produce a mostly-healthy built after some effort, their overall pace is glacial, and their resulting products are buggy as hell.


Caveats: Pairing and mobbing might change how I feel. Deploying straight to production might change how I feel.

Pay Yourself First

November 7, 2017 at 3:31 pm

If you’ve ever spent time learning about saving for retirement, you may have come across a concept named “Pay Yourself First”.

Pay Yourself First is a reaction to a common financial problem; people base their spending on the amount of money that they have coming in each month, and therefore don’t save money for the future – to deal with emergencies that might come up, or to invest for retirement.

If you pay yourself first – remove the money you want to save from your world before you start looking at how much you can spend – you can establish a savings/investment habit.  It will cost you a little on an ongoing basis, but it won’t be long before your investments start spinning off free cash that can either be re-invested or used for other purposes. The $1000 that you saved will – over the long term – make you much more than that.

Why am I writing about investing?

Well, I’ve realized that this is exactly the same situation that many software teams get in.

Each month, they look at their income (developer time), and then they look at their expenses (bugs to fix, system maintenance, features they want to build), and allocate their income.

That’s the software equivalent of living paycheck-to-paycheck. And it pretty much ensures that you are going to be stuck with your current level of bugs, your current level of maintenance, and your current feature speed – at best. Realistically, these things are likely to degrade a bit as time passes.

If instead, you “pay yourself first”, your investment can help you reduce the amount of time you spend fixing bugs and make you faster at creating features. Do this on a consistent basis, and your investments will start spinning off free time that you can then reinvest to further improve your speed, or deployed to create more features than you could have before.

It’s not easy to set aside even $50 a week when you are starting, nor is it easy to set aside a developer-day on an ongoing basis. But once you’ve started doing this consistently, the returns will make it worthwhile.

What about your team? Are you paying yourself first?


June 22, 2017 at 3:31 pm

Did that title get your attention? Good.

Like the other #no<x> assertions – NoEstimates, NoBugs – I’m really not saying that you shouldn’t do TDD. Well, maybe I am…

I was an early TDD advocate, as I really liked the way it helped me organize my thoughts, and I respected some of the other people who were early advocates – people like Ron Jeffries.

But looking back on the 15 years since I started with TDD, I have to say that it really did not live up to my expectations.

What I’ve seen is a consistent pattern of TDD working in a laboratory setting – developers are quick to pick up the workflow and can create working code and tests during classes/exercises/katas – and then failing in the real world.

My hypothesis for this is very simple. When you look at the TDD evangelists, all of them share something: they are all very good – probably even great – at design and refactoring. They see issues in existing code and they know how to transform the code so it doesn’t have those issues, and specifically, they know how to separate concerns and reduce coupling.

If you have those skills, TDD works great; you see what the issues are as the arise and fix them incrementally, and your simple tests prove that your design has low coupling. And, you have the tests to lean on in the future.

A hypothetical

Let’s take the TDD workflow and remove the third step – the refactoring step. What would we expect to happen?

Well, we would expect to end up with classes that have multiple concerns in them – because we didn’t split them apart – and they would be inconvenient to test. We would need to write a lot of test code, and would need a lot of help to write the code, either creating many hand-written mocks or using mock libraries.

Which I submit is precisely the result that most developers get using TDD; the resulting code looks exactly like what we would expect if developers are skipping the third step. Which isn’t really very surprising, given that the tests in most non-TDD code look the same way, and we know that most developers do not have great design/refactoring skills.

At this point we should ask ourselves, “are these developers getting a net positive benefit from using TDD?” Let’s list the pros/cons:


  • We end up with tests that verify the behavior of the code and help prevent regressions



  • It takes us longer to write code using TDD
  • The tests get in the way. Because my design does not have low coupling, I end up with tests that also do not have low coupling. This means that if I change the behavior of how class <x> works, I often have to fix tests for other classes.
  • Because I don’t have low coupling, I need to use mocks or other tests doubles often. Tests are good to the extent that the tests use the code in precisely the same way the real system uses the code. As soon as I introduce mocks, I now have a test that only works as long as that mock faithfully matches the behavior of the real system. If I have lots of mocks – and since I don’t have low coupling, I need lots of mocks – then I’m going to have cases where the behavior does not match. This will either show up as a broken test, or a missed regression.
  • Design on the fly is a learned skill. If you don’t have the refactoring skills to drive it, it is possible that the design you reach through TDD is going to be worse than if you spent 15 minutes doing up-front design.

    I think that’s a good summary of why many people have said, “TDD doesn’t work”; if you don’t have a design with low-coupling, you will end up with a bunch of tests that are expensive to maintain and may not guard against regression very well.

    And to be fair, the level of design and refactoring skills required is relative to the state of the codebase you’re working in. There are a lot of legacy codebases with truly monumental amounts of coupling in them and figuring out how to do anything safely is very difficult and complex.

    TDD—, Refactoring++

    Instead of spending time teaching people TDD, we should instead be spending time teaching them more about design and especially more about refactoring, because that is the important core skill. The ability to write and refactor code to a state with low coupling, well-separated concerns,  and great names will be applicable in all the codebases that we work in.

    That is not to say that I don’t dust off Ncrunch now and then to either create a new set of functionality with great tests or to create tests around a concern as I do a refactoring, because I still do that often. I just don’t think it’s the right place to start from a educational or evangelical perspective.

    Agile and the Theory of Constraints – Part 3: The Development Team (2)

    March 30, 2017 at 3:13 pm

    In the last post, I asked you to put some numbers on the red (aka “rework”) arrows on the diagram. There are two ways to express these numbers; either the percentage of the time a specific path is taken, or the number of times the path is taken on average. I’ve chosen the latter because it is easier to understand. As I noted in the last post, it doesn’t help us a lot to put numbers on the blue boxes because their times vary so drastically.


    In value stream maps, we want to measure the overall time through the system. We don’t have good times for the task boxes themselves, but it’s really worse than that; pretty much every system has ways to jump queues or even skip whole sections of the process, and the numbers are different if it’s the first time through or a rework.

    Does that mean this value stream map is useless?

    After all the time I’ve spent writing this, I am fervently hoping the answer is “no”. Let’s see what we can do with what we have…

    Searching for the bottleneck

    We normally start optimization by searching for the bottleneck. Here are a few characteristics of bottlenecks:

    1. They are always busy.
    2. Work piles up in front of them.
    3. Downstream resources are regularly idle.

    Let’s walk through these, and see how we might apply them.

    1. This one works well with machines, but people are another matter. Not only can we easily look busy, we might be busy with low-value work.
    2. Work piling up in front of them is a pretty good indicator. This is another way of saying, “examine the queues and find the ones that are the longest”.
    3. A machine that is after a bottleneck will look idle. As I noted in #1, humans are great at looking busy.

    It looks like we will be looking at the queues.


    Absent any real data on where our bottleneck is, I’m going to start with the “owner acceptance” part of the process. What has drawn me here are the rework numbers, which mean that we generally find one design issue and one bug when we hit owner acceptance, which will require us to rewind back into the process. That means the owner acceptance process time for a given feature is really the sum of:

    1. The time spent in the incoming queue.
    2. The time to perform the acceptance test.
    3. The time spent in the “redesign” and “fix bugs” queues (queues that are absent in the current map).
    4. The time for the redesigned and fixed feature to wind its way through the process again.
    5. The time spent in the incoming queue for the second time
    6. The time to perform the acceptance test.
    7. The time to repeat steps 3-6 if the feature still isn’t right.

    Not only are the rework issues slowing this feature down significantly, they are slowing all the other features down because we are using the same set of resources. And, they are significantly increasing the amount of work required in the owner acceptance step – we have to do most items twice – which is certainly going to make the acceptance testing itself slower.

    Another way of looking at this is that rework causes a multiplier effect based on how often it happens and how far back in the map it sends you. Acceptance is at the end and it can send you all the way back to the beginning, so it can have a very big impact.

    The solution here isn’t very surprising; we should work to a) reduce the size of the acceptance queue, so features don’t stall there, and b) reduce the chance that we need to do rework to address design issues and bugs.

    There are a few agile practices that we might use:

    • Embedding a product owner in the team
    • Having defined and regular interaction between the product owner and the developers
    • Doing feature design with the product owner and the whole team to make sure everybody is on the same page
    • Writing explicit acceptance criteria for features ahead of time.
    • Using ATDD with tools like Cucumber/Gherkin/Specflow.
    • The best solution is going to depend on your organization and how you structure development. I recommend focusing on a shared understanding of scenarios and acceptance criteria up front, as that will both prevent issues and streamline acceptance testing by the team.

      One more thought on this area; the situation here is an example of an handoff between teams. Handoffs are a great place to look for waste, because:

      • They aren’t under one team’s control, which means it would take cross-team effort – which most companies are poor at – to improve them
      • The incentives in place are usually on a per-team basis, and therefore they reinforce an “us vs. them” perspective.
      • Politics often show up.
      • There are often fixed contracts/SLAs

      In other words, it’s a lot of effort to try to fix them from within a team and the success rate isn’t great, so they persist. In lean terms, the teams on both sides will consider the handoff to be a constraint, and will subordinate (ie de-optimize) the rest of their process to try to deal with it as best they can.

      It is therefore very important to ask, “what would happen if this handoff did not exist?”

      Moving upstream


      Moving upstream, we next come to the test box. It has a lot in common with the owner acceptance scenario, but in thinking about it I realized that I’m missing some important detail in the drawing. For sake of discussion, I’ve collapsed the whole upper part of the diagram into a simple “Design & Code” box.


      Not only is there a queue before test, there are queues on the output of test. The first queue sits before the triage process, where we spend time talking about how bad each bug is and whether we want to expend time fixing it. This is typically done by managers and other senior (read as “expensive”, both in monetary terms and opportunity cost terms) people. I put “1-30 days” in front of that as a guess, but in my experience triage happens every week or every few weeks.

      After that, bugs move into another queue, which is typically called the “bug backlog”. The reason that the label says “1-N” is that the bug backlog is where many bugs go to die; unless you are fixing bugs faster than you are creating them, this list is going to continue to grow and you therefore will not fix all of the bugs.

      I missed another feature of the diagram:


      Because bugs come in faster than you fix them, the bug backlog will grow to be too big, so there is often a “retriage” step, where you look at the bugs you already triaged and get rid of the lower-priority bugs. This is an example of rework of the rework.

      What can we say about the effect of this on the time it takes a feature to flow through the system? If you agree that a bug is an indication that a feature has not met acceptance, then if you have bugs against that feature you are not done, and therefore most of your features are never done; you may have shipped them, but you are still engaged in rework on them after you ship them.

      How did we end up here? Well, this seems like a good time for a story that I call “FeatureLocks and the Three Groups”…

      FeatureLocks desperately wanted to go out into the world, but he was held up because he had bugs. He went and talked to the developers, and they said, “FeatureLocks, it’s not our fault, it’s the test team’s job to find bugs, and they are too slow at it”. Then he went and talked to the testers, and they said, “FeatureLocks, we are sorry there are bugs, but we are always busy testing and the developers are just writing too many bugs”. Then he went to the triage group, and they said, “FeatureLocks, we know that you want to go out into the world, but you have 10 priority 1 bugs logged against you and you’ll just have to wait until they are fixed”.

      FeatureLocks was so sad. His goal in life was to deliver value, but he couldn’t do that until he worked well and could visit customers.

      They did not live happily ever after.

      At this point, I feel compelled to talk a bit about charter.

      If we want to improve our development process, we need a set of goals. Something like:

      • Decreasing the time it takes features to move through the system
      • Decreasing the time it takes to fix bugs
      • Decreasing the number of bugs
      • Identifying more bugs earlier in the process
      • Reducing the average bug age

      Who owns the charter to make things better?

      Upper management certainly owns the overall charter, but they aren’t really close enough to the problem, so they delegate it to the individual team managers. That doesn’t work because each of the group only owns a part of the problem, and the handoff/interactions between groups aren’t really owned by anybody.

      Which means we really have two high-level problems; the first is the one we identified in the diagram – we have queues and lots of rework – and the second is that nobody owns the overall problem.

      You have probably figured out that this is a long discussion that will lead us towards what is sometime called “combined engineering”. We take the dev and test teams and we combine them into a larger “engineering” team. What do we get out of this?

      • Since we don’t have a handoff between teams, our queue between coding and testing will be smaller. More importantly, the way that process works is owned by one team, which means they can improve it.
      • It is in the new team’s best interest to make bugs easier to find and fix.

      My experience is that combined engineering is a significant improvement over individual teams.

      There is a natural extension to combining dev and test; the “devops” movement includes operations functions such as deployment into the same team, and has the same sort of benefits.

      The meta point – which I stole directly from the Theory of Constraints – is to always look at the interfaces between teams, because those are places where inefficiencies naturally accumulate.

      That pretty much covers the bottom part of the diagram; in my next post I’ll be tacking the design / code part.

      Part 3: The Development Team (3)