Trip Report: Agile Open Northwest 2017

February 14, 2017 at 4:46 pm

Agile Open Northwest uses a different approach for running a conference. It is obviously around agile, and there is a theme – this year’s was “Why?” – but there is no defined agenda and no speakers lined up ahead of time. The attendees – about 350 this year – all show up, propose talks, and then put them on a schedule. This is what most of Thursday’s schedule looked like; there are 3 more meeting areas off to the right on another wall.

imag0068

I absolutely love this approach; the sessions lean heavily towards discussion rather than lecture and those discussions are universally great. And if you don’t like a session, you are encouraged/required to stand up and go find something better to do with your time.

There are too many sessions and side conversations that go on for me to summarize them all, but I’ve chosen to talk about four of them, two of mine, and two others. Any or all of these may become full blog posts in the future.

TDD and refactoring: Did we choose the wrong practice?

Hosted by Arlo.

The title of this talk made me very happy, because something nearly identical lived on my topic sheet, and I thought that Arlo would probably do a better job than I would.

The basic premise is simple. TDD is about writing unit tests, and having unit tests is a great way to detect bugs after you have created them, but it makes more sense to focus on the factors that cause creation of bugs, because once bugs are created, it’s too late. And – since you can’t write good unit tests in code that is unreadable – you need to be able to do the refactoring before you can do the unit testing/TDD anyway.

Arlo’s taxonomy of bug sources:

  • Unreadable code.
  • Context dependent code – code that does not have fixed behavior but depends on things elsewhere in the system
  • Communication issues between humans. A big one here is the lack of a ubiquitous single language between customers to code; he cited several examples where the name of a feature in the code and the customer visible name are different, along with a number of other issues.

I think that the basic problem with TDD is that you need advanced refactoring and design skills to deal with a lot of legacy code to make it testable – I like to call this code “aggressively untestable” – and unless you have those skills, TDD just doesn’t work. I also think that you need these skills to make TDD work well even with new code – since most people doing TDD don’t refactor much – but it’s just not evident because you still get code that works out of the process.

Arlo and I talked about the overall topic a bit more offline, and I’m pleased to be in alignment with him on the importance of refactoring over TDD.

Continuous Improvement: Why should I care?

I hosted and facilitated this session.

I’m interested in how teams get from whatever their current state to their goal state – which I would loosely define as “very low bug rate, quick cycle time, few infrastructure problems”. I’ve noticed that, in many teams, there are a few people who are working to get to a better state – working on things that aren’t feature work – but it isn’t a widespread thing for the group, and I wanted to have a discussion about what is going on from the people side of things.

The discussion started by talking about some of the factors behind why people didn’t care:

  • We’ve always done it this way
  • I’m busy
  • That won’t get me promoted
  • That’s not my job

There was a long list that were in this vein, and it was a bit depressing. We talked for a while about techniques for getting around the issue, and there was some good stuff; doing experiments, making things safe for team members, that sort of thing.

Then, the group realized that the majority of the items in our list were about blaming the issue on the developers – it assumed that, if only there wasn’t something wrong with them, they would be doing “the right thing”.

Then somebody – and of course it was Arlo – gave us a new perspective. His suggestion was to ask the developers, “When you have tried to make improvements in the past, how has the system failed you, and what makes you think it will fail you in the future?”

The reality is that the majority of developers see the inefficiencies in the system and the accumulated technical debt and they want to make things better, but they don’t. So, instead of blaming the developers and trying to fix them, we should figure out what the systemic issues are and deal with those.

Demo your improvement

Hosted by Arlo.

Arlo’s sessions are always well attended because he always comes up with something interesting. This session was a great follow-on to the continuous improvement session that I hosted.

Arlo’s basic thesis for this talk is that improvements don’t get done because they are not part of the same process as features and are not visibly valued as features.

For many groups, the improvements that come out of retros are either stuck in retro notes or they show up on the side of a kanban board. They don’t play in the “what do I pick up next” discussion, and therefore nothing gets done, and then people stop coming up with ideas it seems pointless. His recommendation is to establish second section (aka “rail”) on your kanban board, and budget a specific amount to that rail. Based on discussions with many managers, he suggested 30% as a reasonable budget, with up to 50% if there is lots of technical and/or process debt on the team.

But having a separate section on the kanban is not sufficient to get people doing the improvements, because they are still viewed as second-class citizens when compared to features. The fix for that is to demo the improvements the same way that features are demo’d; this puts them on an equal footing from a organizational visibility perspective, and makes their value evident to the team and to the stakeholders.

This is really a great idea.

Meta-Refactoring (aka “Code Movements”)

Hosted by me.

In watching a lot of developers, use refactoring tools, I see a lot of usage of rename and extract method, and the much less usage of the others. I’ve have been spending some time challenging myself to do as much refactoring automatically with Resharper – and by that, I mean that I don’t type any code into the editing window – and I’ve developed a few of what I’m thinking of as “meta-refactorings” – a series of individual refactorings that are chained together to achieve a specific purpose.

After I described my session, to friend and ex-MSFT Jay Bazuzi, he said that they were calling those “Code Movements”, presumably by analogy to musical movements, so I’m using both terms.

I showed a few of the movements that I had been using. I can state quite firmly that flipchart is really the worst way to do this sort of talk; if I do it again I’ll do it in code, but we managed to make it work, though I’m quite sure the notes were not intelligible.

We worked through moving code into and out of a method (done with extract method and inlining, with a little renaming thrown in for flavor). And then we did a longer example, which was about pulling a bit of system code out of a class and putting it in an abstraction to make a class testable. That takes about 8 different refactorings, which go something like this:

  1. Extract system code into a separate method
  2. Make that method static
  3. Move that method to a new class
  4. Change the signature of the method to take a parameter of it’s own type.
  5. Make the method non-static
  6. Select the constructor for the new class in the caller, and make it a parameter
  7. Introduce an interface in the new class
  8. Modify the new class parameter to use the base type (interface in this case).

Resharper is consistently correct in doing all of these, which means that they are refactorings in the true sense of the word – they preserve the behavior of the system – and are therefore safe to do even if you don’t have unit tests.

They are also *way* faster than trying to do that by hand; if you are used to this movement, you can do the whole thing in a couple of minutes.

I asked around and didn’t find anybody who knew of a catalog for these, so my plan is to start one and do a few videos that show the movements in action. I’m sure there are others who have these, and I would very much like to leverage what they have done.

 

Stop writing bad tests. Write only the tests that you can do great.

October 9, 2016 at 8:25 pm

I’ve been working on a talk on ways to make unit testing easier. I has not been going well; I’d come up with an approach I liked, do most of the slides for it, come back to it, and be unhappy with what I had written.

This happened – and I am not exaggerating – 4 times in a row.

In the 5th try, as I was working through the techniques I was going to talk about, I realized something. But let me back up a bit first…

Pretty much every introduction for unit testing starts with a very simple scenario using a very simple class; the flow is something like:

  1. Figure out what a method does
  2. Write a test for it
  3. Repeat

Or, if you are doing TDD, you swap the order and write the test before the method.

With a small class and small problem space, this works well; it’s simple for developers to understand, and you can therefore present it to a group and they walk out thinking that they understand unit testing.

There is one *tiny* problem, however.

It doesn’t work on real classes – and by that, I mean the classes that are in most real system. We all know that the testability of existing codebases is low, and we also know that most developers do not have the design skills or experience to take that sort of code and write good unit tests for it.

But that is what developers try to do, because THAT IS WHAT OUR INTRODUCTION TOLD THEM UNIT TESTING IS ABOUT.

So, they take their existing code, add in a bunch of interfaces so they can inject dependencies, pull in their favorite mocking library, shake it around a bit, and end up with a unit test.

Hmm…

  • There is a lot of test code that takes time to write
  • Because of the high coupling, changes in many areas of the system will require changes in the test
  • The test is very hard to understand, and it’s often not clear whether the test is actually testing what it says it is testing
  • Developers do not feel like they are being successful in writing unit tests.

And – AND – there is very little chance of the test driving improvements, which is one of the main reasons we are advocating for a unit-testing approach.

We have been going about this in the wrong way.

We should focus on teaching developers how to look at code and figure out what different units are lurking in a single class, and also teaching them how to extract those units out so that simple tests can be written for them.

And… we should lighten up on the “you should write tests for everything”, because these expensive complex tests aren’t doing anybody any good.

You Suck at TDD #8 – Doing fewer things

July 5, 2016 at 3:05 pm

Welcome back to You Suck at TDD. Today’s code will show up in the Improvements-Phase-3 branch if you would like to follow along.

In our last episode, we concentrated mostly on the Employee fetching and filtering. Things are better, but we still have a problem…

Well, actually, we have a number of problems, but we’ll start with the first one that I see, which is that Yucky.GetEmployees() is trying to do too many things – it both fetches data and filters it.

This is one of the most common things I see when I look at code – methods that try to do too much, and they are generally quite challenging to test because they do too much.

EmployeeSource class

So, we’ll start by working on that. I pulled the fetching code out into a new EmployeeSource class and then I pulled the creation of the EmployeeSource instance out of GetEmployees().

At this point, GetEmployees() only has a couple of lines in it:

public static EmployeeCollection GetEmployees(EmployeeFilter employeeFilter, EmployeeSource employeeSource)
{
    var employeeCollection = employeeSource.FetchEmployees();

    return employeeCollection.Filter(employeeFilter.Matches);
}

We can easily simplify it with an inline of employeeCollection:

return employeeSource.FetchEmployees().Filter(employeeFilter.Matches);

At this point, there is no reason to have a separate method, so I inlined FetchEmployees() and did some cleanup in Main(), and deleted class Yucky.

This does make Main() more complex than I would like it, but I did the inline with a purpose; I find that my thoughts are often constrained by the abstractions that are present, and the act of getting rid of a bad abstraction sometimes makes it easier to find the right abstraction.

Spending a little time with Main(), it isn’t testable code, and I’d like to get some tests in there…

If we abstract out writing the output to the console, we get something like this:

var collection = employeeSource.FetchEmployees().Filter(employeeFilter.Matches);

WriteToConsole(collection);

That’s decent code; since it doesn’t have any conditionals, it’s likely code that either works all the time or never works at all. Not my first target when I’m looking to make things more testable, but if we wanted to cover it, how can we do it?

Well, the most obvious thing to do is to extract the IEmployeeSource interface, create a method that calls that, create a mock (or a simulator using P/A/S), and then write a test or two. That adds some complexity; you have another interface hanging around, and you need to create and maintain the mock and simulator. It will definitely work, but it is a fairly heavyweight approach.

I’m instead going to go with a different option. One of the things I’ve noticed with developers is that while they may pick up on the opportunity to create a data abstraction – or not, given the title of this series – they rarely pick up on the opportunity to create an algorithmic abstraction.

So, I’m going to go in that direction.

From an abstract perspective, this code does the following:

  • Creates a chunk of data
  • Performs an operation on that chunk of data to create a second chunk of data
  • Consumes that chunk of data

Our current implementation implements this through procedural statements, but we could consider doing this differently. Consider the following code:

public class Pipeline
{
    public static void Process<T1, T2>(Func<T1> source, Func<T1, T2> processor, Action<T2> sink )
    {
        sink(processor(source()));
    }
}

This is an abstraction of the pattern that we were using; it’s general-purpose code that takes a source, processor, and sink, and knows how to wire them up to pass data between them. The version of Process() that I wrote handles this specific case, but you can obviously write versions that have more than one processor.

Doing that with our code results in the following:

Pipeline.Process(
    employeeSource.FetchEmployees,
    (employees) => employees.Filter(employeeFilter.Matches),
    WriteToConsole);

I like the first and third arguments, as they are just method names. I would like the second line also to be a method name.

We can recast this by adding a Filter() method to EmployeeFilter, which then yields:

Pipeline.Process(
    employeeSource.FetchEmployees,
    employeeFilter.Filter,
    WriteToConsole);

That makes me fairly happy; Pipeline.Process() is tested, and the code to use pipeline is almost declarative.

Take a few minutes and look through the current implementation, and write down what you like and what you don’t like.

Looking at the code, it is improved, but there are still a number of things that I don’t really like:

  • I don’t have a way to test my EmployeeSource class, so there is probably a trip to P/A/S coming up.
  • I’m not really in love with Pipeline; I have an alternate approach in mind that I think will be better.
  • The implementation of EmployeeSource has that ugly retry logic in it; that is something algorithmic that I hope can be teased out of the fetching.

Those will come up next time.

 

Agile and the Theory of Constraints – Part 2: The development cycle

May 17, 2016 at 3:18 pm

In the last post, I talked about some of the analysis used in lean. Now, let’s see how we can apply the same principles to the software world.

I will start by trying to create a value-stream map. But where to start? There are three levels at which we can do the mapping:

  1. The entire organization, starting at the idea stage and going through the stage where we ship bits to customers.
  2. The overall implementation of a feature/MBI/whatever by the development team.
  3. The inner developer loop for a single story/feature/whatever by a single developer or pair.

I decided to start at #3, spent a lot of time writing, and then realized that the inner loop discussion depends a lot on the other ones, so I put that aside, and decided to start at the outside, with level #1. First issue solved…

I also need to decide what process I am going to map. I’ve worked in a number of different teams with very different processes, many of which I don’t remember perfectly. So, instead of choosing a real team, I’ve decided to invent a composite team that is a bit more on the old-school side of things. Perhaps “old-school interested in building a new school” is a decent description.

The actual process of creating one of the is done through an interactive discussion with a number of people and is very customized to the way a particular organization works. Some of what I create here will not apply to you, so I’ve come up with two ways that might make it more useful for you.

First, I encourage you to draw your own diagram as you read through what I am saying – or perhaps as you read through it a second time. And second, I’ve added a few Question sections where you might want to think a bit about what your answer is before you read mine.

That said, let’s dive in. Our organization builds software, and follows the following steps:

  1. Enter Idea
  2. Triage by priority
  3. Feature estimation
  4. Release planning
  5. Create
  6. Ship

Let’s put those into our first diagram:

dev1

Here’s a little bit more information about the steps:

Enter idea

Somebody comes up with an idea, and we need to track it, so it gets entered into a system. This takes about 15 minutes per item.

Traige by priority

Once we have the ideas, we need to order them by priority so we know what the most important ones are. This is typically done by a small group of people who meet for a couple of hours every month.

Feature estimation

To be able to determine which features fit into the release, we need to know how big they are. We will ask the engineering team to do the estimation for us. Since we ask them to give us good estimates, it takes anywhere from 1 to 8 hours of work for them to do each estimate.

Release planning

Before we can start working on a new release, we need to figure out what features we are going to put into it. This is a very important function as it determines what we will invest in for the next release and there are lots of competing priorities, so we set up a release planning group and they spend a lot of time figuring out a draft plan, getting it reviewed, modifying it, rolling it out, etc. It takes about 100 hours for the group to do this, spread over a period of a couple of months.

Create

The engineering team takes the list of features, works through them, and creates the new version of the software. We are using a 2 year ship cycle, so this phase is that two years minus the overhead of planning and shipping, or about 20 months.

Ship

There are things that we need to finish at the end of the cycle; we do acceptance testing and fix the leftover important bugs and go through all the mechanics of putting together the full release. This takes about 2 months.

Let’s add those times to the chart:

dev2

Something you may have noticed is that there are different kinds of times here; “Enter idea” and “Feature estimation” are per-item times, “Triage by priority” is a periodic short meeting, and the rest are just time boxes.

What’s next? Well, those steps just don’t feed directly from one to the next; we have inventory. And wow, we have a lot of it; so much that we give each set of inventory a name so we can tell them apart. We’ll add the inventory with names and counts to the diagram:

dev3

These are numbers that I’ve seen for a small product group; one with 20 or so developers on it. They are conservative; I’ve seen examples where the number of prioritized features is well about 500.

There may be a way to express the timeboxed sections properly using the value-stream mapping nomenclature, but I don’t know of one, so I’ll stick to this approach.

Since we will be trying to determine how long it takes a item to move through our system, periodic processing poses a bit of a challenge; one item might get lucky and arrive on the day of the meeting, and another might get unlucky and arrive right after a meeting and have to wait for the next one. We’ll therefore choose an average that is half of the cycle time.

Here’s a little table of our average end-to-end time:

Step Inventory count Elapsed time
Enter Idea 0 15 minutes
Triage by Priority 200 2 weeks
Feature Estimation 200 5 weeks
Release Planning 100 2 months
Create 50 20 months
Ship 50 2 months
Total 26 months

A little over two years for an idea to come in and make it through our system, which is about what we would expect for a 2 year cycle.

Where’s the bottleneck?

Question: In our current system, where do you think the bottleneck lies?

Answer: Our analysis is a little complicated by the fact that many of our operations aren’t full time, so let’s focus on the Create and Ship steps. Let’s express them as rates:

Create: 2.5 features/month

Ship: 25 features/month

So, creation is our bottleneck, which means I’ve spent a lot of words coming to the result everybody was expecting. Let’s continue building the rest of our model to see if we get some more insight into what is going on.

Because of the long lead time – 26 months – this approach cannot be very responsive to changes in business climate or to needs that are discovered as part of the development process. There are different ways to address this; for this part of the discussion, we are going to choose one traditional way of getting there.

dev4

We’ve added a change request process, whereby we identify a small number of very-high-priority features and feed them into the middle of the create process, thereby getting better responsiveness. This feels like an agile and responsive thing to do, but in reality, it really doesn’t help us a ton, for a couple of reasons:

  • The later a change request is injected in the cycle, the closer it is to shipping and therefore the shorter the time period is before it will show up as value to our customer, so we want to get the change requests in late.
  • The later a change request is injected in the cycle, the more disruptive it is to the development process, and the higher the priority that the ship date will slip, the change request will not be well-implemented, and the overall product will suffer more quality issues, so we want to get them early.

Or, to put it in ToC terms, adding additional destabilizing work to the bottleneck does not yield great results, for obvious reasons.

Pretty much everybody hates change requests, but they are viewed as a necessary evil under this model.

Quality

Our group creates a lot of bugs. Let’s add them into our design:

dev5

This world of bugs is a complex one. There are weekly triages, bug bars, ship rooms, customer support interactions, service-level agreements, quick fix engineering, hotfixes, service packs, etc. Lots of process.

I spent a while diagramming the process with all of the detail, and quickly realized that if I wanted to include it in this diagram, I had to simplify it considerably, so this is a bare-bones representation. I am not including verification time for bugs, and I’m also not including loop-back for retriaging existing bugs or bugs that were marked as fixed but aren’t, nor am I including regressions. All of which increase the time significantly.

<aside>

If you have a lot of bugs, I highly suggest trying to diagram what happens with bugs including the customer and all the teams that deal with bugs and all the steps you go through. I think it will be enlightening on how much those bugs are costing you.

</aside>

So, anyway, we have two sources of bugs: bugs from the customer that come in from the left and bugs out of the current create step that come in from above. We triage them, and then they are fixed by our development team – the same group that is trying to do the create work. Then we either ship them as fixes to our previous product, or we feed the fixes back into the product under development. Or both.

The time the development team spends fixing bugs is time that they aren’t spending on product work, or – in ToC terms – we have added another stream of work to our bottleneck. How does it affect the total time for an item to go through the system?

Well, if our team fixes 1000 bugs during this long cycle – and that’s not an outlandish number – and they each take 4 hours to fix – also not an outlandish number, then we are spending 500 days of time on bug-fixing. A 20 month project for 10 developers is on the order of 4000 days, so it looks like we are spending around 12% of our time on bugs. Which really doesn’t seem that bad, so we can probably ignore it.

However, if we have a lot of bugs external to the create step, there are also a lot of bugs internal to that step, which are costly. I’ll defer that to the next post.

That completes the creation of our initial diagram. It’s a simplification of the real world, but should server our purposes.

Talkin’ ’bout Optimization

Now that we have identified the constraint, the first thing that we need to do is to exploit it. Which is one of the topics for the next post.

The problem that we have at this level is that we have a significant and very obvious problem – the problem of bugs – but we *have not* attempted to deal with it in the proper way, from within the development team.

Question for the class: Viewed through the lens of ToC, how would you describe the bug triage and fix system? What step does it belong to?

Answer: The whole bug flow is about both subordinating and elevating the constraint.

We have de-optized the process for our non-developers by introducing a new stream of work, which is handled by bug-fix orchestration technicians. They are in charge of looking at bugs, deciding which ones are important enough to fix, deciding when we should fix them, deciding who should fix them, tracking our progress on fixing them, predicting when all of them will be fixed. etc. etc. etc.

And, as this process is obviously costly, we introduce bug-fix orchestration *optimization* technicians, who are responsible for making the process faster. And bug-fix customer advocates, who liaison with the orchestration technicians to make sure that the customer impact of bugs is properly considered. And, in the ship step, we have bug fix coordination meetings, where we get together to make sure we are fixing the right bugs.

And much of this process is handled by our management, which means we are using our high-value assets towards this goal, instead of deploying them towards more strategic concerns.

And I’m still not done…

In my diagram, there’s the section of bugfixing where we are fixing bugs in products that have already been shipped. This takes time away from the developers working on the current product, so we can elevate our constraint by setting up a separate process to handle bugfixing for shipped products.

We create a separate group that is only responsible for bugfixes in our current shipping product, staff it up, and that makes our main process faster. We call this group “sustained engineering”. They are responsible for creating and managing service pack releases to fix the worst of our bugs.

Question: How effective are these approaches at reducing the effect of our constraint?

Answer: Not very. In fact, it can be argued that they make the problem worse.

To draw a manufacturing analogy…

We are building cars, and we have been noticing that when they come out of the paint shop, the doors consistently have some glops of paint on them. We also know that our customers are complaining about the door paint flaking off.

The paint shop is busy creating new doors and also has a lot of doors that need to be redone piled up outside, so we set up a group that looks at all the doors that failed our QC check, takes the ones that we think aren’t bad enough to really bother the customer, and puts them back on the assembly line. Only the very bad doors go back to the paint shop. We look through this whole pile every few months, or at least most of it.

We have customers complaining about gloppy and flaking paint. We tell them that we don’t normally fix those sorts of issues, but if they buy the special extended warranty package, they can get special access to our door fix technicians. We have a separate group who is in charge of doing that – once again, so we don’t slow down the paint shop.

 Question: Taking the ToC perspective, what are we missing?

Answer: We missed exploiting the constraint.

It is pretty clear that we have a massive rework problem; we are spending vast amounts of effort fixing issues that are come out of the create process, and it’s so bad that we are ignoring a number of issues that our customers care about. But instead of exploiting the constraint, we are trying to mitigate it from the outside.

I will expand on the bug discussion in a future post, but I think there is an interesting parallel in the manufacturing world.

In the 1950s and 1960s, the US auto industry was quite successful at building products that didn’t have great quality. Then, a bunch of imported cars showed up – cars built by companies that had adopted a very different view on quality, a view that, ironically, came mostly from a number of people from the US.

You probably know how that turned out; the US auto industry learned a painful lesson about trying to compete with companies that are using a better system.

The parallel to software is pretty obvious.

Enough about bugs for now, time to switch topics…

Risk

Risk is something that those of us in development teams really don’t think too much about; that is mostly a topic for the business people. But I’d like to spend a little time on it here.

The two big risks that we need to worry about are:

  • Schedule risk – the risk that we will not be able to finish the project on the estimated schedule
  • Market risk – the risk that the features that we developed are not longer relevant to our market.

We have a problem with our 2-year cycle; that is quite a long period of time and it is well known that software estimates aren’t very good, so we run a pretty good chance of either being late or having to cut features. Or both. And we have the list of bugs we need to fix, and regressions to fix, and we know that the time to fix bugs isn’t terribly predictable.

There is also a large market risk; we know from our diagram that it takes 26 months for most features to make it through the pipeline, and though we can use change requests to be faster, it will generally only cut our cycle in half, and that’s still fairly long.

This long cycle also requires a fairly large investment of money; a development team of 10 can easily cost $200,000 per month, so over the two years, we’ve put 24 * $200,000, or $4.8 million into the release.

In the theory of constraints, the bottleneck in the system isn’t always something physical; sometimes it’s a policy or choice made by the business. The 24-month cycle is an arbitrary one, and it is something that we could change.

If we pick a shorter approach, we can cut our schedule risk, cut our time to market, and cut our investment. Which seems like a total win to do.

How can we make this change the easiest and safest way? Well, we know how to plan in 2 months, and we know how to ship in 2 months. How about if we shrink the “create” part of the process down to two months, which will give us a 6-month overall cycle (2 months planning, 2 months create, 2 months shipping). Once we do a couple of those and get used to them, we will naturally find ways to cut down the planning part, and we can probably make the shipping part a bit faster as well.

This is a great idea, but it turns out that pretty much nobody does this, and here’s the reason.

In our old system, we were devoting 20 months out of 24 to creating features (well, and bug fixes, but I’m ignoring that for now). If we go to a 6 month cycle, we only get two months worth of “create” in each cycle, for a total of 8 months every two years.

8 months of feature creation is much less than 20, so there is no chance we are going to adopt that approach, despite the fact that it gives us far faster time to market, much less risk, and is something that our team can demonstrably do successfully.

What teams generally do is go for a linear reduction; our 6 month cycle becomes 2 weeks planning, 20 weeks coding, and 2 weeks shipping. Planning works fine – it’s easier, in fact – and development starts the way it always done, and then – as the ship portion approaches – the team realizes the following:

  • They aren’t quite done with their features.
  • They have a lot of bugs to fix
  • All their processes were designed to work in a two-month-long ship step.
  • They have absolutely no idea how to ship in two weeks.

Their options are:

  1. Delay shipping.
  2. Cut corners

The team generally picks a combination of these. And then, they turn right around and start the next cycle.

There’s another bit of organizational psychology going on here. In a lot of US companies, being bold, aggressive, and making big bets is encouraged and rewarded. That is why we see re-org after re-org; we are doing something significant.

This makes little sense to me. From the agile and lean perspective, we want continuous improvement, because that is the path where the risk is minimized. There is very likely a path that takes the team from a 2-year cycle to a great 6-month, 3-month, or even 1-month cycle, but it’s very clear that you cannot get there just by decree.

That’s all for this post. In the next one, we are going to crack open the create process and see what’s inside.

Part 3 – The Development Team (1)

Agile and the Theory of Constraints – Part 1

April 25, 2016 at 4:52 am

I’ve been spending some time over the past few months exploring the lean side of the house and looking for things I can adapt into the agile side of the house. The most interesting thing I found was the theory of constraints.

After spending some time writing this, I realized that I need to split this into two separate posts; one where I talk about the theory of constraints in general, and the second where I talk about how I think it applies to software.

For this section, I’m going to talk about manufacturing, partly because that’s where the theory was originally applied, and partly because it’s more approachable. Trust me that what I write here will apply to software development.

The classic work on this is “The Goal” by Goldratt, which I highly recommend.

Let’s make it better

From: Plant Manager
To: Component assembly;welding;painting;packaging
Subject: Improvement

It’s time to kick off our 2016 improvement process; I would like each of you to get together with your teams and figure out what your improvement targets are going to be for next year. 

Signed,

Plant Manager

If a business does improvement – and many do not – this is a pretty typical approach. And, if the word, “poorly” pops into your head, you have already figured out how well this sort of approach works. If you think we are better in the software business, you are mistaken. In general, we are quite a bit worse.

Why do these programs fail? It’s very simple…

To make a process faster, you must first determine why it is slow.

I’m hoping somebody is saying, “I know why my process is slow”. And you may be right, but I am also very convinced that you are also quite wrong. And that gives me a chance to introduce my first thought:

Thought 1: To improve a system, you must first understand the whole system. 

If you do performance optimization of programs, you may know the first law of optimization, which is, “The part of the program that is making things slow is never the part that you think it is”. If you go around optimizing the parts of the program that you think are slow, it doesn’t really get much faster.

Hmm. Isn’t that exactly what is happening with groups trying to get faster?

To make a program run faster, you use a tool to analyze the behavior of the whole system – a profiler. And, to make a manufacturing plant run faster,you need a similar tool.

That is what the theory of constraints can give us – a way to look at the whole system.

Our goal

When doing optimization, we need some sort of metric, or goal. In performance optimization, it’s execution time.

What should our goal be? I’ve already asserted that “improve the output of each section” is an ineffective way to look at things, and I’ve said that I want to look at the overall system, so how about “improve the output of the whole plant” as a goal?

That seems good. We know how to measure it (units shipped per month), and everybody can work together to make it better. And I agree that shipping more units per month would be a good thing, but as a goal, it is an utter failure, because of a couple of simple problems.

It gives us absolutely no guidance on how to actually improve the current state; it does not tell us why the current system is slow.

It’s also a bad goal for another reason; if we all pull together, we can ship a lot of really crappy product in a short period of time.

So, we need a better measure, and luckily, there is a very good one; we track the elapsed time it takes us to produce a product, from order to ship. Let’s walk through how we are going to track it.

Define our process

Here is the process for our plant:

man1

We are trying to figure out how long it takes from when we start manufacturing to when it goes out the door, so, we go out and do some measurements of how long each process takes, and add them to our diagram.

man2

And now we know that it takes 60+45+60+30 = 195 minutes to make one item, and we can go off and start optimizing. It probably makes sense to start with component assembly and painting, since they take the longest.

Wrong.

In this scenario, the current end-to-end time for a specific item is on the order of 10 days.

Wait, what? How can it be 12 days if the process takes only a little over 3 hours to complete?

I’d like you to ruminate on the situation. There is something missing in the diagram that I drew, and it’s something that I could easily have measured when I went out and measured the time each individual step. What is missing?

(spoiler space)

 

 

 

 

 

 

 

 

 

 

 

An improved picture

man3

What I was missing was the concept of inventory. Whenever there is a handoff between two steps, there may be an accumulation of inventory. That is where the extra time is; we have 100 items waiting to be welded, so each item will have to wait for the 100 items in front of it to be processed first. That will take 4500 minutes, or about 75 hours. There are two items waiting for painting, so that is 1 hour of time there, and the 5 items waiting to be packed add 2.5 hours of time.

So the total time is 75 + 1 + 2.5 = 78.5 hours of lag + 3 hours of processing = 81.5 hours, or a little over 10 days.

<aside>

Why is inventory bad?

Inventory is bad, let me count the ways:

  1. It ties up a lot of capital; we have invested money in the raw materials and the labor to create the intermediate state. If the inventory was lower, we could deploy that capital elsewhere or we could improve our return on investment.
  2. We don’t know how good it is. If our component assembly starts producing poor-quality items, it will be nearly 10 days until we find out, and we will have to throw away/rework a lot of expensive items.
  3. The items in inventory represent things that we think we need, but our plans may change during that time, leaving us with intermediate items that are of little use.
  4. We have to pay to store it, track it, move it around, etc.

</aside>

The diagram I’ve created is a very simple version of a value stream map. And the times I used are pretty conservative; it isn’t uncommon for the end-to-end time of a single item to be measured in months.

We’re going to put aside the amount of inventory for a moment, and focus on the steps.

Now, can we make it better?

Do we know where we are slow?

The answer is “yes”. Looking at the diagram, we can tell that we have an issue with the welding process. You can do it mathematically; component assembly is capable of creating 3 items per hour, but welding can only do 1.3 items per house. Or, you can just look for the places where inventory piles up in the factory.

Saying welding is slow is really a misstatement; there may be absolutely nothing wrong with the welding process; what we have discovered is that the welding process is a bottleneck in our system. Because it is the slowest step, it is constraining the output of the system to be – at best – one item every 45 minutes.

That concept is why the theory of constraints is named what it is – we have a constraint, and it controls the output of the whole system.

Let’s now cast our minds back to my earlier assertion that “everybody get better” programs don’t work, and see whether this diagram can shed any light on the situation. What happens to the system if we improve the speed of component assembly, painting, or packaging?

That’s right, pretty much nothing. Which leads to:

Thought #2: If you aren’t addressing the bottleneck, you won’t improve the overall system performance

Given that few groups know what their bottlenecks are, it’s not surprising that their attempts at optimization don’t improve their system performance.

Now that we have a target, it’s time to talk about ways to address it. There are a few options:

  1. Increase capacity (buy more equipment). This is the go-to option in most cases, because most groups don’t know how to optimize. It’s also the priciest.
  2. Optimize the bottleneck. Look at the process of the bottleneck in detail, and see if there is a way to optimize it. This might mean creating a separate value stream map for the bottleneck. Can we get better utilization out of the machine?
  3. Subordinate the other parts of system to the constraint.

Let’s talk about the third one, because it’s the least obvious and therefore most interesting one. Looking at the picture again, we have a lot of excess capacity in component assembly. We do a little investigation, and discover that about 20 minutes of the welding work isn’t welding work, it is “getting ready to weld” work. Let’s modify our process again.

man4

We pulled 20 minutes of work out of welding and added it to assembly. What is our total end-to-end time?

Well, our welding queue is now 100*25 = 2500 minutes, or 42 hours, but the painting queue has gotten bigger, and now has 20 hours. Total is 42 + 20 + 2.5 = 64.5 hours, a significant improvement. Note that the queue sizes I chose were for purpose of illustration.

And how did we do that? We did it by moving work to the component assembly step; it is now a full 20 minutes slower than it was before, but it being slower had no impact on the overall system, because it still produces items faster than the welding step can consume them.

Thought 3: Sometimes the best way to improve a bottleneck is to de-optimize the steps around it. 

This is another “wait, what?” moment; one step got slower, and the overall system got faster. It’s a little easier to see if I add in a little table:

Step items/hour before items/hour after
Component assembly 3 2.5
Welding 1.3 2.4
Painting 2 2
Packaging 2 2

It’s pretty obvious why we had a big queue in front of welding in the first case. It’s still slower than component assembly, but it is now faster than painting, so we are seeing a queue show up there. And we pushed our overall system performance up to 2.4 items per hour.

Which leads to another question. Is welding still the bottleneck?

The answer is obviously “no”; painting and packaging are the bottlenecks now. So, that is where we would work next.

Rework

I made a simplifying assumption for the earlier diagrams; I assumed that all of our processes were perfect. But, in reality, they aren’t, so it would be good to add that to our diagram.

man5

This is one way of expressing rework. It says that 10% of the time, we spend an extra 30 minutes on welding to fix issues from the previous step, so that bumps the average welding time up to 28 minutes. There can also be inventory before or after the rework part; as you might expect, this can bump up the end-to-end time significantly.

What to do with inventory

After improving our throughput, we still had one big queue and we were starting to accumulate another. In a perfect world – where every step was matched in capacity – our queues would be a fixed size, but that never really happens; there is always a bottleneck one place or another. And, as we notice, our end-to-end time is heavily dominated by the time due to the queues.

If we let the system parts free-run, that will lead to the accumulation of a ton of inventory over time, which is very bad. The most common approach is to switch from a push system – where the previous step just runs flat-out – to a pull system – where the previous step runs when the next step needs more items. This is commonly known as a “just in time” approach, and in it’s simplest incarnation, you set an inventory trigger point (say, of a few hours inventory) that lets the previous step know it should start running again.

If we take our queues down to half a day, we would only end up with 1.5 days worth of inventory in the system, taking the overall end-to-end time down to around 2 days. That cuts our inventory cost down to about 20% of what we had before, and our agility is up a similar amount.

But… this change isn’t free. Since we are carrying less inventory, we need to watch the system a little more closely to make sure that we don’t run out.

Because our component assembly step has excess capacity, we will need to run it slightly slower so inventory doesn’t build up. This approach flies in the face of traditional management philosophy; we are specifically telling a group to either slow down or do something else during the down time.

Setup time

It’s pretty common for a single machine/team to do multiple things. In that case, there is a setup time overhead to switching from doing one thing to another. Implementing pull systems will tend to drive the batch size down, and it’s important to remember that there is overhead to consider.

Summary

After writing this, I went out and looked at the defined steps of the theory of constraints again, to see how I did. Here are the steps:

  1. IDENTIFY the system’s constraint.Okay, I covered that.
  2. EXPLOIT the constraint”Exploit” here means “do whatever you can to optimize within the constraint”. I forgot this one initially but went back and added it.
  3. SUBORDINATE everything else to the constraintThis is the part about being willing to de-optimize another area to speed up the constrained part.
  4. ELEVATE the constraint.If it’s still a constraint, this is when you throw money at the situation; buy new machines, that sort of thing.
  5. PREVENT INERTIA from becoming the constraint.If you addressed one constraint, your new constraint moved somewhere else. You’ll need to think of that next.

 

That’s all for the introduction to the theory of constraints. In my next post, I’ll take the techniques that I talked about and apply them to the software world.

Part 2 – The Development Cycle

Doesn’t pairing cost twice as much?

April 5, 2016 at 11:02 pm

(I was recently involved in a discussion about pairing, and I think what I wrote will be of more general interest.)

Many teams evaluate pairing from a simple mathematical perspective.

Total work done = # of team members * amount of time spent working

If you want to increase the amount of total work done, then you either increase the number of team members or you increase the amount of time they spend working. Those are your levers.

Under that approach, when you add pairing, you end up with:

Total work done = (# of team members / 2) * amount of time spent working

Or, you only get half as much work accomplished.

As I’m writing this, I fear that you think that I’m create a caricature of teams, and that nobody really operates that way. Unfortunately, the sad truth is that the majority of the teams that I’ve worked on or worked with spend virtually all of their time working and almost none of their time figuring out how they might work faster. And, if a team works in such a fixed mindset, it’s not surprising that they think pairing is a bad idea.

There is a saying in the performance tuning world which says, “You can make small improvements by speeding up parts of your code, but the best way to make big improvements is by not doing things at all”.

This applies much more to process. So, let’s take a look at some of the things going on during the development process, and see what effect pairing has on them:

  1. Bugs are very expensive; you pay to find/document/track/triage/workaround/investigate/fix/verify them. I do not have any data at hand, but I wouldn’t be surprised if most bugs cost the organization a person-day, even if they are found by the development team. That is a lot of waste. Pairs are much better at finding bugs than singles.
  2. You can replace your code review process with something better. It is a long-held tenet that effective code reviews are a cornerstone of creating high-quality code, and to talk about changing that is heresy in a lot of places. Regardless of the fact that many groups that have an extensive code review process have crappy quality, and that tightening the review process generally does not improve that quality.

    There is a technique known as code inspection where the code is inspected in detail by a group, and there is some research that shows that it works, but it is a hugely expensive technique. In reality, code review has a poor cost/benefit payback; reviews are not effective at detecting important issues and usually deal with superficial things like syntax. The basic problem is that it is too hard to get inside the thinking process of the person who wrote the code and figure out why they did what they did, and – even if you can do that – it’s expensive for them to go back and rework things.Perhaps one could come up with a way to do continuous code review. That would let you catch the issues up front, and since you could discuss implementation with the other people involved, you would not only catch more bugs but you would end up with better implementations overall that was understood more widely.  And then you could reduce the amount of time code is waiting for review and the number of interruptions for people.

    And yes, I just described pairing and mobbing. I don’t advocate throwing out your whole code review process; I think the pair working on the code can make a good call if they need other input before checking in. I guarantee that pairing + no additional code review will generate fewer bugs than individual work and deep code review.

    If you are able to do this in your team, it is a game-changer. In many cases, you can pay for the extra cost of pairing through this change alone. Oh, and I missed one more benefit; this allows your developers to work on one thing rather than switching between them, which is a very wasteful and error-prone process.

  3. Developers hate being stupid and love finding things out. That means that we are much more likely to spend time researching something online and/or trying to figure out what a problem is than asking somebody. Hours of time. Put two people together, and the pair with both figure out things on their own more quickly and be quicker to ask for help when they can’t.

These are all important benefits, and I think they are enough to “pay for” pairing on their own. But, I actually don’t think they are the most significant benefits. The big benefits are at the meta level…

When I was a team manager in my early years, I had a number of hard problems:

  • Suzy is the only one who knows the rendering engine well, but she’s on vacation next week and I suspect she is getting bored with working on that and may leave.
  • Rick is a very good web developer, but right now the set of features that we are working on is heavily biased towards the server side, and I don’t know what to assign to Rick. Next month, I think I’ll probably have the opposite problem.
  • Todd just joined the team and I’ve pointed him to our team onboarding and overview documents, but it’s out-of-date, and everybody is focused on their current task so he’s floundering a bit.
  • Jill is my senior dev and has great design and TDD skills and knows a lot of ways to work fast. We’ve tried brown-bags for her to share this with the rest of the team, but they aren’t working very well.
  • There’s a new project in a couple of months that the team will need to jump on, but nobody currently on the team has the relevant experience.
  • etc.

Nearly all of these just melt away if my team is pairing often. Knowledge transfer happens quickly and easily during pairing, and everybody gets more versatile and better at their jobs.

You Suck at TDD #7 – Improvements Phase 2

March 27, 2016 at 6:20 pm

All changes from this post are in the Improvements Phase 2 repo

Commits that start with “R:” are done through Reshaper. I’ll cluster them together this time.

R: Start of Phase 2

Last time, I created the EmployeeFilter abstraction and populated it with code from the GetEmployees() method. That was a pretty typical refactoring to deal with primitive obsession. But, I don’t think I’m quite done with that particular thought, as there’s another bit of primitive obsession hanging around in the code. This kind is arguably more common than the first type, especially in the Linq/IEnumerable world…

List<Employee> result = new List<Employee>();

Employee is a domain type, but List<T> is one of those computer-sciency types that I don’t like. We end up with little bits of code that do things to the collection/enumerable scattered all over the codebase, and it makes our testing and cohesion a lot harder.

If your code has helper methods that take a List<T> or IEnumerable<T>, you probably have this. If you have extension methods that take these types, then you very likely have this. Or, if you have code that does foreach or Linq operations on the list in our code, you also have this. Fixing this can have a big impact on testability and maintainability.

So, let’s see if we can morph this particular instance into something that’s a little better.

I’ll start by extracting the creation of the list into a separate method named CreateEmployeeCollection(), and extract the new method into a new EmployeeCollection class. This new class will be used instead of the raw list.

Now, I’m going to convert this method to an instance method. This takes two separate operations; first I need to add a parameter that will become the “this” parameter, and then I can convert it to a non-static method. When I add the parameter, I can specify that I want it to create a new instance of the class. Then I just use the “make it non-static” refactoring.

R: Pull result creation into a separate method
R: Extract class EmployeeCollection
R: Changed CreateEmployeeCollection signature to take an EmployeeCollection parameter
R: Make it non-static

I should probably note that it took me a couple of times to get this next series to work the way that I wanted it to work, so there are a couple of reverts that I did before starting forward again.

The first thing I need to do is to take the local variable and make it a field. When I do this, I’ll initialize it in the constructor.

R: Introduce field to hold the collection

 And now I’m going to encapsulate it into a field. When I do this, I choose to update all of the usages to use the property. This choice is important.

R: Encapsulate the list of employees into a property

I’m almost done with this set of transformations. I now inline the CreateEmployeeCollection() method, which gives us a more normal usage in the caller:

var result = new EmployeeCollection().Items;

R: Inline create method back into the caller…

Still a bit weird, but there’s a way to fix that. I’ll start by extracting out the collection to a variable,which gives me this:

EmployeeCollection employeeCollection = new EmployeeCollection();
var result = employeeCollection.Items

and then I can inline result. That leaves us using the employee collection instead of the result.

R: Introduce a variable for the new type
R: Inline the result variable

All that I have left of this transformation is changing the return type of the method so that it returns the collection rather than the items. I used to just do this by hand, but I figured out a better way to do it. It’s another variant of the “extract method / inline method” approach. Basically, I’m going to sequester the “.Items” part of the return statement and shove it back to the caller.

I’ll extract the collection from the return statement into a new variable. Then I’ll extract a new method that does everything except the last part of the return type, inline the old method, and rename the method to get back to where we started. Look at the commits if you want the details:

R: Create a variable for the new return type
R: Extract method except for the last return statement
R: Inline the original method
R: Rename GetEmployees2() to GetEmployees
R: One inline to clean things up

At this point, the EmployeeCollection is fully-formed. You may not be impressed by the simplification in this sample code – and that’s a fair criticism – but this is a very simple bit of code. In most cases there are a lot of places where callers are using this data that I can fold into the EmployeeCollection class.

And… I don’t think I’m quite done with the class yet. But that will have to wait, as it’s time to write some tests. There really isn’t a lot going on in the collection right now, so I only added a single test.

Onto the meat of the problem…

Here’s a bit of code that I’ve been wrestling with since I wrote the original code. The problem is that the filtering code is intertwined into the database code. That’s the thing I’m going to address next:

while (reader.Read())

                {
                    int id = reader.GetInt32(EmployeeIdColumnIndex);
                    string name = reader.GetString(EmployeeNameColumnIndex);
                    int age = reader.GetInt32(EmployeeAgeColumnIndex);
                    bool isSalaried = reader.GetBoolean(EmployeeIsSalariedColumnIndex);

                    if (employeeFilter.Matches(name, age, isSalaried))
                    {
                        employeeCollection.Items.Add(new Employee
                        {
                            Name = name,
                            Id = id,
                            Age = age,
                            IsSalaried = isSalaried
                        });
                    }
                }

I started by pulling the creation of the employee instance out into a separate variable. And then, I decided that was the wrong thing, so I reverted it. Instead, I target the “match and add” code, and did the usual set of transformations to move it to the collection class:

R: Extract filter and add to a new method
R: Move the method to the EmployeeCollection class
R: Convert to instance method

This is looking better, but the signature for the function is like this:

public void AddEmployeeIfMatch(EmployeeFilter employeeFilter, string name, int age, bool isSalaried, int id)

It doesn’t make a lot of sense to pass in the individual items to the filter; we should pass in an employee. So *now* I want to pull the creation of the employee out of the method.

R: Extract the Employee creation out of the method.

And extract it out into a variable…

R: Extract employee creation into a variable.

That leaves me with the following code;

        public void AddEmployeeIfMatch(EmployeeFilter employeeFilter, string name, int age, bool isSalaried, int id, Employee employee)
        {
            if (employeeFilter.Matches(name, age, isSalaried))
            {
                Items.Add(employee);
            }
        }

Now, I’d like to get rid of those individual variables and replace them with the employee instance. I want to do that at the filter level, and I’m going to use a little trick to get most of the way there. Basically, I’m going to create another class that is like that is the existing Employee class, and then merge them together.

On EmployeeFilter.Match, I do the “create class from parameters” refactoring, and create a nested Employee class. This looks pretty much like the employee class I’ve already created, except it’s missing one of the fields.

R: Extract new Employee class

I’m left with this line:

            if (employeeFilter.Matches(new EmployeeFilter.Employee(name, age, isSalaried)))

and I’m just going to hand-edit it to say what I want:

            if (employeeFilter.Matches(employee))

and then delete the new extracted Employee class, and clean up the tests.

I’m not really happy with the mechanics of this refactoring, but I’m not smart enough to figure out something better.

Merge together usages of both Employee classes

Now I can delete the parameters I don’t need any more

R: Delete no-longer-used parameters

Now that I’ve written some code, I want to backfill with tests. I pushed code into EmployeeCollection, so I’ll start there. Here’s an example of one of the tests that I wrote:

        [TestMethod]
        public void When_I_create_an_EmployeeCollection_and_add_an_employee_that_does_match__its_in_the_items_collection()
        {
            EmployeeCollection employeeCollection = new EmployeeCollection();

            EmployeeFilter employeeFilter = new EmployeeFilter(EmployeeFilterType.ByName, "X");

            employeeCollection.AddEmployeeIfMatch(employeeFilter, new Employee() { Name = "Xavier" });

            Assert.AreEqual(1, employeeCollection.Items.Count);
            Assert.AreEqual("Xavier", employeeCollection.Items.First().Name);
        }

Since these posts are supposedly about TDD, let’s look at the feedback that the tests are giving us.

My preference is for tests that are very simple. This test isn’t bad, but I don’t like the fact that we require an EmployeeFilter instance to test whether the EmployeeCollection class works correctly. They really shouldn’t be coupled together.

My traditional way of dealing with this is to define an interface – let’s call it IEmployeeFilter – and then define a few test versions. Something like EmployeeFilterTrue and EmployeeFilterFalse. That’s not a bad choice, but I had a different thought.

What if I went more functional? Instead of describing the matching function as implementing a specific interface, what if I described it as having a specific shape?

public void AddEmployeeIfMatch(Func<Employee, bool> employeeFilter, Employee employee)

That decouples the two classes, and now the tests do not need to create an EmployeeFilter instance. This design takes a little bit of getting used to, but the benefits of coupling through something other than a class or interface are significant.

Let me say that again because I think it’s really important:

Taking a more abstract/functional approach to how classes coordinate with each other can have significant impact on how testable and maintainable your code is because it eliminates dependencies rather than just abstracting them. 

And, we also gained a new functionality; you can write a custom filter inline without having to modify the EmployeeFilter class, since you can just write an inline lambda filter.

Use Func<Employee, bool> as parameter

The code is getting better, but I think the filtering part is still a little bit weird. You can only filter as part of adding an item to a collection, which means that filtering must be coupled to fetching. This has been bothering me since I first started on the code. I’ve been thinking about how to do it with automatic refactoring, but haven’t figured out how, so I’m just going to do it by hand.

I do it by introducing a new collection, so the fetching into the collection and the filtering act as two separate operations.

Here’s the new code:

                while (reader.Read())
                {
                    int id = reader.GetInt32(EmployeeIdColumnIndex);
                    string name = reader.GetString(EmployeeNameColumnIndex);
                    int age = reader.GetInt32(EmployeeAgeColumnIndex);
                    bool isSalaried = reader.GetBoolean(EmployeeIsSalariedColumnIndex);

                    var employee = new Employee
                    {
                        Name = name,
                        Id = id,
                        Age = age,
                        IsSalaried = isSalaried
                    };
                    employeeCollection.Items.Add(employee);
                }
            }

            EmployeeCollection filteredEmployees = new EmployeeCollection();
            foreach (Employee employee in employeeCollection.Items)
            {
                filteredEmployees.AddEmployeeIfMatch(employeeFilter.Matches, employee);
            }

            return filteredEmployees;

Now we have a two separate chunks of code. Let’s see what we can do with it. First, let’s move the Items.Add() into the collection:

R: Extract AddEmployee method
R: Move to EmployeeCollection class
R: Make it an instance method
R: Rename to Add()

Hmm. There’s no reason that the GetEmployees() method needs to iterate over the collection and call the filter itself. Let’s encapsulate and move it.

R: Extract to Filter method
R: Move to EmployeeCollection class
R: Make it an instance method
R: inline filtered collection

That is nicer, though I need to hoist the function out.

R: Inline add method
R: Make filter a parameter
R: Remove unused parameter

At this point, our approach is very Linq-like. We do need to do some cleanup on the tests. I changed them to use the new Filter() method and removed the old method.

Refactor tests, remove AddWithFilter()

That was a lot of individual steps, and there’s probably about 30 minutes of work there real-time. I’m reasonably happy with the result, though there are some further opportunities for simplification in the collection classes.

 

 

You Suck at TDD #6 – Improvements Phase 1

March 15, 2016 at 3:13 pm

Welcome to the first post on improving the yucky code. I have a few points I’d like to cover briefly, and then we’ll dive into the code.

First off, I’m going to end up refactoring to a specific endpoint, but that is not necessarily the only endpoint or even the best endpoint. Not only are there different opinions on whether one design is better or not, I have found that my own opinions have evolved over time.

Second, I’d like to talk about a meta question. Unless we are working with clean code, there are generally a number of things “wrong” with a given piece of code, which means a decision needs to be made of which issue to address first. I’ve noticed that many developers try to address the biggest issue first. If you go down this road, you will often find that what looked like a simple change balloons into a change involving multiple changes in 30 different files. You are fighting against the complexity, and the big problems often have solutions that are harder to find. This increases the  cost of the whole refactoring – it takes lots of time, it’s disruptive to other people working in the code – and this can give refactoring a bad name on your team.

Instead, start with the smallest problem that you find. This should be easy and quick to fix, which means you will make good progress. Do this a few times, and sometimes the big problem becomes more tractable.

I also am a firm believer that refactoring has to be done in small, incremental steps. To make it easier to follow, I have checked in after every atomic change, and in reality I will probably bunch a few changes together and commit chunks that are a bit bigger, but I still prefer to work in the small, and I find that my commit size has gotten smaller over the years. Some engineering environments do not permit small checkins, and if you’re in one of those, I suggest that you still try to work in very small tests.

Pairing is a wonderful way to do refactoring; pairs attack more issues and solve them better than individuals. You may also be able to use the continuous code review part of pairing to meet your code review requirement so that you can check in often.

My final bit of advice is that if you are lucky to work in languages with good refactoring tools, learn the refactorings that your tool offers, including the keyboard shortcuts. This makes a significant difference in the speed at which you can operate.

I hope you’ve spent a bit of time studying the code here. I’ve created a separate branch, and I’m committing along the way if you’d like to follow on. If the change in a commit was done automatically – in this case with Resharper – it is prefixed with a “R:”.

I’d like to start making this code better. To refresh your memory, here’s the method that we are dealing with. Take a quick look at it:

        public IEnumerable<Employee> GetEmployees(EmployeeFilterType employeeFilterType, string filter, FakeSqlConnection connection)
        {
            if (employeeFilterType == EmployeeFilterType.ByName && filter == null)
            {
                throw new ArgumentNullException("filter");
            }

            string query = "select * from employee, employee_role inner join employee.Id == employee_role.EmployeeId";

            List<Employee> result = new List<Employee>();
            using (FakeSqlCommand sqlCommand = new FakeSqlCommand(query, connection))
            {
                FakeSqlDataReader reader;
                int retryCount = 5;

                while (true)
                {
                    try
                    {
                        reader = sqlCommand.ExecuteReader();
                        break;
                    }
                    catch (Exception)
                    {
                        if (retryCount-- == 0) throw;
                    }
                }

                while (reader.Read())
                {
                    int id = reader.GetInt32(EmployeeIdColumnIndex);
                    string name = reader.GetString(EmployeeNameColumnIndex);
                    int age = reader.GetInt32(EmployeeAgeColumnIndex);
                    bool isSalaried = reader.GetBoolean(EmployeeIsSalariedColumnIndex);

                    switch (employeeFilterType)
                    {
                        case EmployeeFilterType.ByName:
                            if (!name.StartsWith(filter)) continue;
                            break;
                        case EmployeeFilterType.ExemptOnly:
                            if (age < 40 || !isSalaried) continue;
                            break;
                    }

                    result.Add(new Employee {Name = name, Id = id, Age = age, IsSalaried = isSalaried});
                }
            }

            return result;
        }

The simple thing that I’m going to attack is something that I see all over the place in code, a little code smell that is called “Primitive Obsession”.

Simply speaking, primitive obsession is using a computer-sciency construct – such as a integer, a string, or a generic List – instead of a domain construct. When we do this, we generally end up with code specific to the construct scattered all over the codebase because there is no place for such code to cluster.

Are there any examples of primitive obsession in our code? Anyone? Bueller?

The employee filtering approach jumped out at me; we have both validity checking and the filter implementation in our method. After each description, I’ll list the commit label so you can see what changed. The first thing I need to do is to create an abstraction for filter. Resharper has a cool refactoring that will take a set of parameters, and wrap them in a class, so that’s what I used.

R: Extract filter class from parameters

The next thing I want to do is start moving the code related to the filter class from the yucky class to the new EmployeeFilter class. I’ll start with the validation code; first I’ll extract it into a method, and then I’ll move the method into the EmployeeFilter class.

R: Extract Validate Method
R: Move Validate() to filter class

I’ll do the same thing with the filtering code in the middle of the routine

R: Extract DoFilter() method
R: Move DoFilter() to EmployeeFilter

Now that I’ve done that, it’s time to push the new class into a separate file.

R: Move type to another file

The two methods that I used are both static methods which take the EmployeeFilter class as a parameter. I’ll make them instance methods.

Incidentally, the pattern of pulling out primitive types and creating domain types is known as Whole Value.

R: Make methods non-static

Looking at the code, I have the following statement:

                    if (employeeFilter.DoFilter(name, age, isSalaried)) continue;

That’s a little weird; in that the true return means the filter didn’t match. I couldn’t find a way to do this with a refactoring, so I did it by hand.

Invert filter boolean

Let’s look a bit more at the EmployeeFilter class. I note that it has properties for the Filter and EmployeeFilterType, but with the refactoring neither of those are used outside of the class any more. Resharper will let me inline them.

R: Inline Filter and EmployeeFilterType properties

The class is looking fairly good now, but there’s still one thing that is bothering me – that call that I make to Validate() seems extraneous. Why should the GetEmployees() be responsible for that? Why can’t EmployeeFilter guarantee that it is valid?

I’ll start by extracting out the part of the method that I want to keep:

R: Extract the part of GetEmployees() that doesn’t have the validate.

Now, I can inline the original method and rename the new one back to GetEmployees().

R: Inline method and rename

That has pulled the code out of the GetEmployees() class, but I want it in the EmployeeFilter class. I’ll extract it into a factory method, and then move that to the EmployeeFilter class.

R: Extract creation into factor method
R: Move factory to EmployeeFilter class
R: Extract parameters out of the factory

At this point, what I would really like is a “Convert factory to constructor” refactoring, but it does not exist. If anybody has an idea how to do this automatically, let me know.

Move validate method to constructor

And now I can inline the factory to get rid of it…

R: Inline factory method

I don’t really need a separate validation method, so I’ll inline that as well.

R: Inline validate method

At this point, there’s a brief little side issue I’d like to discuss… When I moved the validation to the EmployeeFilter class, I lost the null check in the GetEmployees() method. I’m not a huge fan of adding null checks just to guard against developer errors if the code isn’t a library being used outside the team; I don’t think the cost is worth the benefit.

But if you want that functionality, feel free to add the check in.

Now that we have a nice type, it’s missing something. Tests. So, I went off and wrote a few. They are the next series of commits:

Validation test
Added test for failed name filter
Added test for passing named filter
Added exempt test for too young
Added test for old enough but not salaried
Added passing test for exempt

So, what did I accomplish? Well, the filter code now has a separate abstraction, and the GetEmployees() method does look a little bit better. The refactoring took me about 30 minutes, though the writing took a fair bit more. The vast majority of the work was done with automatic refactorings.

There is still a lot to be done, but that is all for this time.

You Suck at TDD #5 – Homework

March 10, 2016 at 4:17 pm

It is now time for me to turn the tables on you, to present you with some homework. If you take a trip to my Github account, you will find the following lovingly crafted repo.

In it, you will find some yucky code. It’s not terrible code – in fact, it’s pretty average code from my experience – but it does have some issues.

Your task is the following:

  1. Look at the code and figure out what issues there are. I suggest writing them down.
  2. Refactor the code into something that is testable, which is to be proven in the obvious way.

While you do this, I recommend that you try to a) do as much of it using refactoring support (either built into VS or using Resharper), and b) do as much of it with keyboard shortcuts as possible.

In each of the next few posts, I’ll pick a specific issue that I see and take the code through a transformation to make it better.

“The Goal” and “The Phoenix Project”

March 4, 2016 at 9:05 pm

I have been spending some time looking at how to apply some of the lessons of lean to software development team, and part of that has been catching up on some reading.

One of the classics is “The Goal” by Eliyahu Goldratt. In this book, he introduces his theory of constraints and bottlenecks, which is all about looking at process from a high level, figuring out where the main problems are, and addressing those. There is a ton of useful information there and I think it applies well to the software world.

The book is very approachable because it’s written as a novel; the main character works at a manufacturing plant. It’s the “can’t-put-it-down-business-hit-of-the-week”.

If you want something that is a bit more computer related, you could choose to read “The Phoenix Project”, which is like The Goal, but the story is about IT instead of manufacturing. It is also a novel. I think it has most of the goodness of “The Goal”, but it is missing the more formal definition of how to deal with constraints. Perhaps start with the Phoenix Project and read “The Goal” if you want more.

Both are recommended.

What will you learn from these? Well, one of the themes of both books is that the process only works if you do the discovery yourself, so I’m not sure I should tell you the whole story, but here’s the basic overview.

The big factor that controls how much you can do is how quickly you can move work items through the system. Only by identifying the bottlenecks in the system and working to increase their throughput will you increase the overall throughput of the system; if you optimize the steps that are not bottlenecks, the output will not improve. Interestingly, you may need to de-optimize the non-bottlenecks to improve the overall throughput and/or latency of the system.