You Suck at TDD #7 – Improvements Phase 2

March 27, 2016 at 6:20 pm

All changes from this post are in the Improvements Phase 2 repo

Commits that start with “R:” are done through Reshaper. I’ll cluster them together this time.

R: Start of Phase 2

Last time, I created the EmployeeFilter abstraction and populated it with code from the GetEmployees() method. That was a pretty typical refactoring to deal with primitive obsession. But, I don’t think I’m quite done with that particular thought, as there’s another bit of primitive obsession hanging around in the code. This kind is arguably more common than the first type, especially in the Linq/IEnumerable world…

List<Employee> result = new List<Employee>();

Employee is a domain type, but List<T> is one of those computer-sciency types that I don’t like. We end up with little bits of code that do things to the collection/enumerable scattered all over the codebase, and it makes our testing and cohesion a lot harder.

If your code has helper methods that take a List<T> or IEnumerable<T>, you probably have this. If you have extension methods that take these types, then you very likely have this. Or, if you have code that does foreach or Linq operations on the list in our code, you also have this. Fixing this can have a big impact on testability and maintainability.

So, let’s see if we can morph this particular instance into something that’s a little better.

I’ll start by extracting the creation of the list into a separate method named CreateEmployeeCollection(), and extract the new method into a new EmployeeCollection class. This new class will be used instead of the raw list.

Now, I’m going to convert this method to an instance method. This takes two separate operations; first I need to add a parameter that will become the “this” parameter, and then I can convert it to a non-static method. When I add the parameter, I can specify that I want it to create a new instance of the class. Then I just use the “make it non-static” refactoring.

R: Pull result creation into a separate method
R: Extract class EmployeeCollection
R: Changed CreateEmployeeCollection signature to take an EmployeeCollection parameter
R: Make it non-static

I should probably note that it took me a couple of times to get this next series to work the way that I wanted it to work, so there are a couple of reverts that I did before starting forward again.

The first thing I need to do is to take the local variable and make it a field. When I do this, I’ll initialize it in the constructor.

R: Introduce field to hold the collection

 And now I’m going to encapsulate it into a field. When I do this, I choose to update all of the usages to use the property. This choice is important.

R: Encapsulate the list of employees into a property

I’m almost done with this set of transformations. I now inline the CreateEmployeeCollection() method, which gives us a more normal usage in the caller:

var result = new EmployeeCollection().Items;

R: Inline create method back into the caller…

Still a bit weird, but there’s a way to fix that. I’ll start by extracting out the collection to a variable,which gives me this:

EmployeeCollection employeeCollection = new EmployeeCollection();
var result = employeeCollection.Items

and then I can inline result. That leaves us using the employee collection instead of the result.

R: Introduce a variable for the new type
R: Inline the result variable

All that I have left of this transformation is changing the return type of the method so that it returns the collection rather than the items. I used to just do this by hand, but I figured out a better way to do it. It’s another variant of the “extract method / inline method” approach. Basically, I’m going to sequester the “.Items” part of the return statement and shove it back to the caller.

I’ll extract the collection from the return statement into a new variable. Then I’ll extract a new method that does everything except the last part of the return type, inline the old method, and rename the method to get back to where we started. Look at the commits if you want the details:

R: Create a variable for the new return type
R: Extract method except for the last return statement
R: Inline the original method
R: Rename GetEmployees2() to GetEmployees
R: One inline to clean things up

At this point, the EmployeeCollection is fully-formed. You may not be impressed by the simplification in this sample code – and that’s a fair criticism – but this is a very simple bit of code. In most cases there are a lot of places where callers are using this data that I can fold into the EmployeeCollection class.

And… I don’t think I’m quite done with the class yet. But that will have to wait, as it’s time to write some tests. There really isn’t a lot going on in the collection right now, so I only added a single test.

Onto the meat of the problem…

Here’s a bit of code that I’ve been wrestling with since I wrote the original code. The problem is that the filtering code is intertwined into the database code. That’s the thing I’m going to address next:

while (reader.Read())

                {
                    int id = reader.GetInt32(EmployeeIdColumnIndex);
                    string name = reader.GetString(EmployeeNameColumnIndex);
                    int age = reader.GetInt32(EmployeeAgeColumnIndex);
                    bool isSalaried = reader.GetBoolean(EmployeeIsSalariedColumnIndex);

                    if (employeeFilter.Matches(name, age, isSalaried))
                    {
                        employeeCollection.Items.Add(new Employee
                        {
                            Name = name,
                            Id = id,
                            Age = age,
                            IsSalaried = isSalaried
                        });
                    }
                }

I started by pulling the creation of the employee instance out into a separate variable. And then, I decided that was the wrong thing, so I reverted it. Instead, I target the “match and add” code, and did the usual set of transformations to move it to the collection class:

R: Extract filter and add to a new method
R: Move the method to the EmployeeCollection class
R: Convert to instance method

This is looking better, but the signature for the function is like this:

public void AddEmployeeIfMatch(EmployeeFilter employeeFilter, string name, int age, bool isSalaried, int id)

It doesn’t make a lot of sense to pass in the individual items to the filter; we should pass in an employee. So *now* I want to pull the creation of the employee out of the method.

R: Extract the Employee creation out of the method.

And extract it out into a variable…

R: Extract employee creation into a variable.

That leaves me with the following code;

        public void AddEmployeeIfMatch(EmployeeFilter employeeFilter, string name, int age, bool isSalaried, int id, Employee employee)
        {
            if (employeeFilter.Matches(name, age, isSalaried))
            {
                Items.Add(employee);
            }
        }

Now, I’d like to get rid of those individual variables and replace them with the employee instance. I want to do that at the filter level, and I’m going to use a little trick to get most of the way there. Basically, I’m going to create another class that is like that is the existing Employee class, and then merge them together.

On EmployeeFilter.Match, I do the “create class from parameters” refactoring, and create a nested Employee class. This looks pretty much like the employee class I’ve already created, except it’s missing one of the fields.

R: Extract new Employee class

I’m left with this line:

            if (employeeFilter.Matches(new EmployeeFilter.Employee(name, age, isSalaried)))

and I’m just going to hand-edit it to say what I want:

            if (employeeFilter.Matches(employee))

and then delete the new extracted Employee class, and clean up the tests.

I’m not really happy with the mechanics of this refactoring, but I’m not smart enough to figure out something better.

Merge together usages of both Employee classes

Now I can delete the parameters I don’t need any more

R: Delete no-longer-used parameters

Now that I’ve written some code, I want to backfill with tests. I pushed code into EmployeeCollection, so I’ll start there. Here’s an example of one of the tests that I wrote:

        [TestMethod]
        public void When_I_create_an_EmployeeCollection_and_add_an_employee_that_does_match__its_in_the_items_collection()
        {
            EmployeeCollection employeeCollection = new EmployeeCollection();

            EmployeeFilter employeeFilter = new EmployeeFilter(EmployeeFilterType.ByName, "X");

            employeeCollection.AddEmployeeIfMatch(employeeFilter, new Employee() { Name = "Xavier" });

            Assert.AreEqual(1, employeeCollection.Items.Count);
            Assert.AreEqual("Xavier", employeeCollection.Items.First().Name);
        }

Since these posts are supposedly about TDD, let’s look at the feedback that the tests are giving us.

My preference is for tests that are very simple. This test isn’t bad, but I don’t like the fact that we require an EmployeeFilter instance to test whether the EmployeeCollection class works correctly. They really shouldn’t be coupled together.

My traditional way of dealing with this is to define an interface – let’s call it IEmployeeFilter – and then define a few test versions. Something like EmployeeFilterTrue and EmployeeFilterFalse. That’s not a bad choice, but I had a different thought.

What if I went more functional? Instead of describing the matching function as implementing a specific interface, what if I described it as having a specific shape?

public void AddEmployeeIfMatch(Func<Employee, bool> employeeFilter, Employee employee)

That decouples the two classes, and now the tests do not need to create an EmployeeFilter instance. This design takes a little bit of getting used to, but the benefits of coupling through something other than a class or interface are significant.

Let me say that again because I think it’s really important:

Taking a more abstract/functional approach to how classes coordinate with each other can have significant impact on how testable and maintainable your code is because it eliminates dependencies rather than just abstracting them. 

And, we also gained a new functionality; you can write a custom filter inline without having to modify the EmployeeFilter class, since you can just write an inline lambda filter.

Use Func<Employee, bool> as parameter

The code is getting better, but I think the filtering part is still a little bit weird. You can only filter as part of adding an item to a collection, which means that filtering must be coupled to fetching. This has been bothering me since I first started on the code. I’ve been thinking about how to do it with automatic refactoring, but haven’t figured out how, so I’m just going to do it by hand.

I do it by introducing a new collection, so the fetching into the collection and the filtering act as two separate operations.

Here’s the new code:

                while (reader.Read())
                {
                    int id = reader.GetInt32(EmployeeIdColumnIndex);
                    string name = reader.GetString(EmployeeNameColumnIndex);
                    int age = reader.GetInt32(EmployeeAgeColumnIndex);
                    bool isSalaried = reader.GetBoolean(EmployeeIsSalariedColumnIndex);

                    var employee = new Employee
                    {
                        Name = name,
                        Id = id,
                        Age = age,
                        IsSalaried = isSalaried
                    };
                    employeeCollection.Items.Add(employee);
                }
            }

            EmployeeCollection filteredEmployees = new EmployeeCollection();
            foreach (Employee employee in employeeCollection.Items)
            {
                filteredEmployees.AddEmployeeIfMatch(employeeFilter.Matches, employee);
            }

            return filteredEmployees;

Now we have a two separate chunks of code. Let’s see what we can do with it. First, let’s move the Items.Add() into the collection:

R: Extract AddEmployee method
R: Move to EmployeeCollection class
R: Make it an instance method
R: Rename to Add()

Hmm. There’s no reason that the GetEmployees() method needs to iterate over the collection and call the filter itself. Let’s encapsulate and move it.

R: Extract to Filter method
R: Move to EmployeeCollection class
R: Make it an instance method
R: inline filtered collection

That is nicer, though I need to hoist the function out.

R: Inline add method
R: Make filter a parameter
R: Remove unused parameter

At this point, our approach is very Linq-like. We do need to do some cleanup on the tests. I changed them to use the new Filter() method and removed the old method.

Refactor tests, remove AddWithFilter()

That was a lot of individual steps, and there’s probably about 30 minutes of work there real-time. I’m reasonably happy with the result, though there are some further opportunities for simplification in the collection classes.

 

 

You Suck at TDD #6 – Improvements Phase 1

March 15, 2016 at 3:13 pm

Welcome to the first post on improving the yucky code. I have a few points I’d like to cover briefly, and then we’ll dive into the code.

First off, I’m going to end up refactoring to a specific endpoint, but that is not necessarily the only endpoint or even the best endpoint. Not only are there different opinions on whether one design is better or not, I have found that my own opinions have evolved over time.

Second, I’d like to talk about a meta question. Unless we are working with clean code, there are generally a number of things “wrong” with a given piece of code, which means a decision needs to be made of which issue to address first. I’ve noticed that many developers try to address the biggest issue first. If you go down this road, you will often find that what looked like a simple change balloons into a change involving multiple changes in 30 different files. You are fighting against the complexity, and the big problems often have solutions that are harder to find. This increases the  cost of the whole refactoring – it takes lots of time, it’s disruptive to other people working in the code – and this can give refactoring a bad name on your team.

Instead, start with the smallest problem that you find. This should be easy and quick to fix, which means you will make good progress. Do this a few times, and sometimes the big problem becomes more tractable.

I also am a firm believer that refactoring has to be done in small, incremental steps. To make it easier to follow, I have checked in after every atomic change, and in reality I will probably bunch a few changes together and commit chunks that are a bit bigger, but I still prefer to work in the small, and I find that my commit size has gotten smaller over the years. Some engineering environments do not permit small checkins, and if you’re in one of those, I suggest that you still try to work in very small tests.

Pairing is a wonderful way to do refactoring; pairs attack more issues and solve them better than individuals. You may also be able to use the continuous code review part of pairing to meet your code review requirement so that you can check in often.

My final bit of advice is that if you are lucky to work in languages with good refactoring tools, learn the refactorings that your tool offers, including the keyboard shortcuts. This makes a significant difference in the speed at which you can operate.

I hope you’ve spent a bit of time studying the code here. I’ve created a separate branch, and I’m committing along the way if you’d like to follow on. If the change in a commit was done automatically – in this case with Resharper – it is prefixed with a “R:”.

I’d like to start making this code better. To refresh your memory, here’s the method that we are dealing with. Take a quick look at it:

        public IEnumerable<Employee> GetEmployees(EmployeeFilterType employeeFilterType, string filter, FakeSqlConnection connection)
        {
            if (employeeFilterType == EmployeeFilterType.ByName && filter == null)
            {
                throw new ArgumentNullException("filter");
            }

            string query = "select * from employee, employee_role inner join employee.Id == employee_role.EmployeeId";

            List<Employee> result = new List<Employee>();
            using (FakeSqlCommand sqlCommand = new FakeSqlCommand(query, connection))
            {
                FakeSqlDataReader reader;
                int retryCount = 5;

                while (true)
                {
                    try
                    {
                        reader = sqlCommand.ExecuteReader();
                        break;
                    }
                    catch (Exception)
                    {
                        if (retryCount-- == 0) throw;
                    }
                }

                while (reader.Read())
                {
                    int id = reader.GetInt32(EmployeeIdColumnIndex);
                    string name = reader.GetString(EmployeeNameColumnIndex);
                    int age = reader.GetInt32(EmployeeAgeColumnIndex);
                    bool isSalaried = reader.GetBoolean(EmployeeIsSalariedColumnIndex);

                    switch (employeeFilterType)
                    {
                        case EmployeeFilterType.ByName:
                            if (!name.StartsWith(filter)) continue;
                            break;
                        case EmployeeFilterType.ExemptOnly:
                            if (age < 40 || !isSalaried) continue;
                            break;
                    }

                    result.Add(new Employee {Name = name, Id = id, Age = age, IsSalaried = isSalaried});
                }
            }

            return result;
        }

The simple thing that I’m going to attack is something that I see all over the place in code, a little code smell that is called “Primitive Obsession”.

Simply speaking, primitive obsession is using a computer-sciency construct – such as a integer, a string, or a generic List – instead of a domain construct. When we do this, we generally end up with code specific to the construct scattered all over the codebase because there is no place for such code to cluster.

Are there any examples of primitive obsession in our code? Anyone? Bueller?

The employee filtering approach jumped out at me; we have both validity checking and the filter implementation in our method. After each description, I’ll list the commit label so you can see what changed. The first thing I need to do is to create an abstraction for filter. Resharper has a cool refactoring that will take a set of parameters, and wrap them in a class, so that’s what I used.

R: Extract filter class from parameters

The next thing I want to do is start moving the code related to the filter class from the yucky class to the new EmployeeFilter class. I’ll start with the validation code; first I’ll extract it into a method, and then I’ll move the method into the EmployeeFilter class.

R: Extract Validate Method
R: Move Validate() to filter class

I’ll do the same thing with the filtering code in the middle of the routine

R: Extract DoFilter() method
R: Move DoFilter() to EmployeeFilter

Now that I’ve done that, it’s time to push the new class into a separate file.

R: Move type to another file

The two methods that I used are both static methods which take the EmployeeFilter class as a parameter. I’ll make them instance methods.

Incidentally, the pattern of pulling out primitive types and creating domain types is known as Whole Value.

R: Make methods non-static

Looking at the code, I have the following statement:

                    if (employeeFilter.DoFilter(name, age, isSalaried)) continue;

That’s a little weird; in that the true return means the filter didn’t match. I couldn’t find a way to do this with a refactoring, so I did it by hand.

Invert filter boolean

Let’s look a bit more at the EmployeeFilter class. I note that it has properties for the Filter and EmployeeFilterType, but with the refactoring neither of those are used outside of the class any more. Resharper will let me inline them.

R: Inline Filter and EmployeeFilterType properties

The class is looking fairly good now, but there’s still one thing that is bothering me – that call that I make to Validate() seems extraneous. Why should the GetEmployees() be responsible for that? Why can’t EmployeeFilter guarantee that it is valid?

I’ll start by extracting out the part of the method that I want to keep:

R: Extract the part of GetEmployees() that doesn’t have the validate.

Now, I can inline the original method and rename the new one back to GetEmployees().

R: Inline method and rename

That has pulled the code out of the GetEmployees() class, but I want it in the EmployeeFilter class. I’ll extract it into a factory method, and then move that to the EmployeeFilter class.

R: Extract creation into factor method
R: Move factory to EmployeeFilter class
R: Extract parameters out of the factory

At this point, what I would really like is a “Convert factory to constructor” refactoring, but it does not exist. If anybody has an idea how to do this automatically, let me know.

Move validate method to constructor

And now I can inline the factory to get rid of it…

R: Inline factory method

I don’t really need a separate validation method, so I’ll inline that as well.

R: Inline validate method

At this point, there’s a brief little side issue I’d like to discuss… When I moved the validation to the EmployeeFilter class, I lost the null check in the GetEmployees() method. I’m not a huge fan of adding null checks just to guard against developer errors if the code isn’t a library being used outside the team; I don’t think the cost is worth the benefit.

But if you want that functionality, feel free to add the check in.

Now that we have a nice type, it’s missing something. Tests. So, I went off and wrote a few. They are the next series of commits:

Validation test
Added test for failed name filter
Added test for passing named filter
Added exempt test for too young
Added test for old enough but not salaried
Added passing test for exempt

So, what did I accomplish? Well, the filter code now has a separate abstraction, and the GetEmployees() method does look a little bit better. The refactoring took me about 30 minutes, though the writing took a fair bit more. The vast majority of the work was done with automatic refactorings.

There is still a lot to be done, but that is all for this time.

You Suck at TDD #5 – Homework

March 10, 2016 at 4:17 pm

It is now time for me to turn the tables on you, to present you with some homework. If you take a trip to my Github account, you will find the following lovingly crafted repo.

In it, you will find some yucky code. It’s not terrible code – in fact, it’s pretty average code from my experience – but it does have some issues.

Your task is the following:

  1. Look at the code and figure out what issues there are. I suggest writing them down.
  2. Refactor the code into something that is testable, which is to be proven in the obvious way.

While you do this, I recommend that you try to a) do as much of it using refactoring support (either built into VS or using Resharper), and b) do as much of it with keyboard shortcuts as possible.

In each of the next few posts, I’ll pick a specific issue that I see and take the code through a transformation to make it better.

“The Goal” and “The Phoenix Project”

March 4, 2016 at 9:05 pm

I have been spending some time looking at how to apply some of the lessons of lean to software development team, and part of that has been catching up on some reading.

One of the classics is “The Goal” by Eliyahu Goldratt. In this book, he introduces his theory of constraints and bottlenecks, which is all about looking at process from a high level, figuring out where the main problems are, and addressing those. There is a ton of useful information there and I think it applies well to the software world.

The book is very approachable because it’s written as a novel; the main character works at a manufacturing plant. It’s the “can’t-put-it-down-business-hit-of-the-week”.

If you want something that is a bit more computer related, you could choose to read “The Phoenix Project”, which is like The Goal, but the story is about IT instead of manufacturing. It is also a novel. I think it has most of the goodness of “The Goal”, but it is missing the more formal definition of how to deal with constraints. Perhaps start with the Phoenix Project and read “The Goal” if you want more.

Both are recommended.

What will you learn from these? Well, one of the themes of both books is that the process only works if you do the discovery yourself, so I’m not sure I should tell you the whole story, but here’s the basic overview.

The big factor that controls how much you can do is how quickly you can move work items through the system. Only by identifying the bottlenecks in the system and working to increase their throughput will you increase the overall throughput of the system; if you optimize the steps that are not bottlenecks, the output will not improve. Interestingly, you may need to de-optimize the non-bottlenecks to improve the overall throughput and/or latency of the system.

Port/Adapter/Simulator: read-only and write-only dependencies…

February 24, 2016 at 8:11 am

When dealing with many external dependencies, the Port/Adapter/Simulator pattern works great.

But what about dependencies that are read-only – such as consuming an information feed from another system – or write-only – such as sending a notification to a user? Can we use P/A/S in those scenarios?

The answer is a qualified "yes" – we can get many of the advantages of P/A/S, though we will probably have to give up something.

Stock prices

Let’s consider an application where we deal with stock prices. If we were in a read/write scenario, we would write something like this:

interface IStockPriceStore
{
    Decimal Load(StockSymbol stockSymbol);
    void Save(StockSymbol stockSymbol, Decimal price);
}

And the test that we would write would look something like this:

IStockPriceStore stockPriceStore = new StockPriceStoreMemory();
stockPriceStore.Save("MSFT", 55m);

StockProcessor stockProcessor = new StockProcessor(stockPriceStore);
stockProcessor.Process();
Assert(…);

Okay, that’s not perhaps as minimal as I would like for a test, but it’s not terrible.

But what if our stock feed is read-only? That changes our adapter to be something like:

interface IStockPriceSource
{
    Decimal Load(StockSymbol stockSymbol);
}

To make our tests run the same way, we need someplace to put a Save() method. My approach is to put it directly on the simulator (StockPriceSourceMemory) class. There are a few ways I’ve played around with expressing this:

  1. Put it directly on the class and name it Save()
  2. Put it directly on the class and name it SimulateSave()
  3. Define a separate interface (IStockPriceSink?), put it there, an have the simulator implement it.

 

I don’t really like the third option, as we have an interface that will only have one implementation. I haven’t decided between the first two; sometimes I like the division that option #2 gives, sometimes I think it’s too wordy.

This approach lets us write the product tests exactly the same way we would with a read/write store, but we lose a big advantage of P/A/S – the ability to test for similarity between our real and in-memory implementations of the adapter.

In many cases, I don’t think this matters a lot, as read-only stores tend to be fairly straightforward. In some cases, it does matter, and there’s another technique we can use. We write our adapter tests so that they pass against the real adapter, and then, when we run those same tests against the simulator, we pre-populate it by fetching specific data from the real adapter and storing it into the simulator. The data that we move across is enough to verify that the simulator works correctly.

This will result in two different sets of tests; there will be the pure-simulator unit tests, and then the contract verification tests that we run against the live data and the simulator.

It’s not a perfect solution, but it will be pretty close.

Email User

Emailing a user is a good example of a write-only operation. We’ll write the port like this:

interface IUserNotifier
{
    void Notify(User user, Message message);
}

I’m really hoping that you weren’t expecting a IEmailUser port.

In this case, our simulator needs to do a couple of things. It needs to verify that Notify was called with appropriate parameters, and it needs to be able to simulate any error situations that we want to raise with the calling application.

How about something like this:

class UserNotifierSimulator: IUserNotifier
{
    void Notify(User user, Message message) {}

    User User {get; private set;}
    Message Message {get; private set;}
}

To use the simulator, I pass it into a class that needs to perform a notification and then look and the User and Message properties to see whether it performed the notification.

As for the port tests, I suspect that I’m going to write a few around the port validation behavior to define what happens when either user or message is improperly formed.

There are other possible errors – the network could be unreachable, for example – but I’m not sure I’m going to write any of them into the port. I would prefer to hide them underneath, to make the real adapter responsible for recovery.

Message passing architectures…

And, so that’s how I’ve done it, and it’s worked pretty well.

But… remember my earlier comment about the test code I wrote for the stock store? The problem there is that I’m saving something into a store just so that the stock processor can fetch it right out. But there’s an alternative:

If I think of this as in message-passing/pipeline terms, I can express things differently. Something like:

class StockPriceSource
{
    public event EventHandler<StockPrice> StockPriceReady;

    public void FetchStockPrice(StockSymbol stockSymbol);
}

I can then hook this up to a modified version of the StockProcessor class by just hooking a method to an event. This approach means that I don’t really need my simulator any more; I have something that can produce one (or a set of) stock prices, and then things that know how to consume them.

I think this probably makes my world simpler. I can write a few quick contract tests for StockPriceSource to verify that it is working, and I can test all the processing just by passing in data.

You suck at TDD #3 – On Design sensitivity and improvement

February 11, 2016 at 8:31 am

Through an above-average display of ineptness, I managed to overwrite my first version of this post with a response I wrote to the comments. Since I’m too lazy to try to reconstruct it, I’m going to go a different direction. In a recent talk about design, I told the attendees that I was going to insult them during the talk, and it worked pretty well, so I’m going to try the same thing here:

You don’t know how to do design. You would not know a good design if it came up and bit you.

To explain this, I feel it necessary to talk a bit about ignorance. I will loop back to the topic at hand sometime in the future.

In a really cool article written back in 2000, Phillip Armour wrote about 5 orders of ignorance, which I’ll summarize here:

  • Oth Order Ignorance (0OI) – Lack of ignorance, or in other words, knowledge. I have 0OI about skiing.
  • 1st Order Ignorance (1OI) – Known lack of knowledge. I have 1OI about how to knit, but I can think of a number of ways to convert this to 0OI.
  • 2nd order Ignorance (2OI) – Lack of awareness. I have 2OI when I don’t know that I don’t know something. I cannot – and this is the point – give examples here.
  • 3rd order Ignorance (3OI) – Lack of process. I have 3OI when I don’t know a suitably efficient way to find out that I don’t know that I don’t know something.
  • 4th order ignorance (4OI) – Meta ignorance. I don’t know about the 5 orders of ignorance.

 

Like any model, this is a simplification of what is really going on, but it’s a useful model.

Back to the insult, but this time I’ll reword it using the orders of ignorance:

You have 20I about design; the code that you are writing has obvious problems with it but you do not see them because you are 2OI about these problems and therefore cannot see the problems yourself.

It’s a little less catchy, and that’s why this series is named "You suck at TDD", not "Systemic influences and limitations in human knowledge acquisition have limited your effectiveness at TDD in interesting ways".

You can probably think back when you learned something about coding – for example, there was probably a time when you learned it was good not to repeat code. Before you learned that, you had 2OI ignorance about it, and now you no longer do.

We *all* have at least some 2OI about design; it’s just a feature of how knowledge acquisition works. That is not the real problem.

The real problem is that most developers have both 3OI and 4OI when it comes to design. They don’t understand how 2OI works and therefore think that they have good design skills, and – probably more importantly – they don’t have an efficient way of identifying specific instances of 2OI and converting them to 1OI and ultimately 0OI.

Or, to put it more concisely, most developers think they are good at design because they can’t see the problems in their designs, they don’t really understand that this is the case, and they don’t have any plan to improve.

If you couple this with our industry’s typical focus on features, features, features, it is no longer surprising that there is so much bad code out there; it becomes surprising that there is code that works at all.

So, tying this whole thing back to TDD, the reason that TDD does not work for a lot of people is that they don’t have sufficient design skills to see the feedback that TDD is giving to them. They get to the refactor step, shrug, and say, "looks good to me; let’s write the next test". And these 2OI issues add up, and they make it harder to do TDD on the code, and it gets hard to do.

This is not surprising. TDD is about evolutionary design, and if you aren’t very good at design, it’s not going to work very well.

Improving your design skills

The remainder of this series is about improving your design skills. I will probably focus on specific patterns that I find especially useful in the TDD world, but I’d like to provide some general advice here. These are things that will help you move from 2OI to 1OI in specific areas of design. And please, if you have other ideas, add them in the comments:

  1. Pair with somebody who cares about design. They will be 1OI or 0OI in areas where you are 2OI, and by having discussions you will learn together.
  2. Read and study. There is a *ton* of useful information out there. If you like abstract, read – and then re-read Fowler’s book on Refactoring. If you like more specificity, go read information about a single code smells, and then find it – and fix it – in your code. If you want a bigger example, just search for refactoring, and you’ll find examples like this.
  3. Share and teach. You may think that you know something about a specific area of design, but when you go to teach it to others, that is when you really learn it.
  4. Practice refactoring skills. If you are lucky enough to work in a language with good refactoring tools, train yourself to use them instead of hand-editing. Not only will this speed you up and reduce the errors you make, it will train you to think differently about code.
  5. Practice TDD and experiment with different approaches. What does "tell, don’t ask" do to your code? Can a more functional programming approach have benefits?

Agile Open Northwest 2016 recap

February 9, 2016 at 8:13 pm

Last week, I spent three days at Agile Open Northwest 2016, a small (300-ish people) agile conference held at the Exhibition Hall at Seattle Center.

Well, perhaps ‘conference" is the wrong word; one of the first things that I read about the conference said the following:

Agile Open Northwest runs on Open Space Technology (OST).

I am generally both amused and annoyed by the non-ironic appending of the word "technology" to phrases, but in this case, I’m going to cut them some slack, because Open Space really is different.

I’ve done a lot of conferences in my time, and one of the problems with conferences is that they are never really about the things that you want them to be about. So, you pull out the conference program, try to figure out what the talks are really going to be like, and go from there.

Open space technology is basically a solution to that problem. It’s a self-organizing approach where a minimal bit of structure is put into place, and then the conference is put on by the attendees.

So, somebody like me could host a session on TDD and design principles and #NoMocks, and – in collaboration with a few others (including Arlo) – we could have very fun and constructive session.

And – far more cool in my book – somebody who knew just a little about value stream mapping to could host a session on applying value stream mapping to software projects and have the people who showed up teach me.

The other non-traditional thing with the conference is the law of personal mobility, which says that it’s okay – no, it’s required – that if you aren’t learning or contributing at a session you have chosen, you leave and find a better use of their time. Which means that people will circulate in and out of the sessions

With the exception of one session, I enjoyed and learned something at all of the sessions that I went to.

The one downside of this format is that you need engaged people to make it work; if you take a bunch of disinterested people and ask them to come up with sessions, nobody is going to step up.

I also got to do a couple of Lean Coffee sessions at breakfast Thursday and Friday. These are a great way to get ideas quickly and cover a lot of topics in a short amount of time.

Overall, I had a great time. If you have passion around this area, I highly recommend this conference.

You suck at TDD #4 – External dependencies

January 26, 2016 at 1:48 pm

When I started doing TDD, I thought it was pretty clear what to do with external dependencies. If your code writes to a file system – for example – you just write a file system layer (what would typically be called a façade, though I didn’t know the name of the pattern back then), and then you can mock at that layer, and write your tests.

This is a very common approach, and it mostly works in some scenarios, and because of that I see a lot of groups stick at that level. But it has a significant problem, and that problem is that it is lacking an important abstraction. This lack of abstraction usually shows up in two very specific ways:

  • The leakage of complexity from the dependency into the application code
  • The leakage of implementation-specific details into the application code

Teams usually don’t notice the downside of these, unless a very specific thing happens: they get asked to change the underlying technology. Their system was storing documents in the file system, and it now needs to store them in the cloud. They look at their code, and they realize that the whole structure of their application is coupled to the specific implementation. The team hems and haws, and then comes up with a 3 month estimate to do the conversion. This generally isn’t a big problem for the team because it is accepted pretty widely that changing an underlying technology is going to be a big deal and expensive. You will even find people who say that you can’t avoid it – that it is always expensive to make such a change.

If the team never ends up with this requirement, they typically won’t see the coupling nor will the see the downside of the leakage. In my earlier posts I talked about not being sensitive to certain problems, and this is a great example of that. Their lives will be much harder, but they won’t really notice.

Enter the hexagon

A long time ago in internet time, Alistair Cockburn came up with a different approach that avoids these problems, which he called the Hexagonal Architecture. The basic idea is that you segment your application into two different kinds of code – there is the application code, and then there is the code that deals with the external dependencies.

About this time, some of you are thinking, “this is obvious – everybody knows that you write a database layer when you need to talk to a database”. I’ll ask you to bear with me for a bit and keep in mind the part where if you are not sensitive to a specific problem, you don’t even know the problem exists.

What is different about this approach – what Cockburn’s big insight is – is that the interface between the application and the dependency (what he calls a “port”) should be defined by the application using application-level abstractions. This is sometimes expressed as “write the interface that you wish you had”. If you think of this in the abstract, the ideal would be to write all of the application code using the abstraction, and then go off an implement the concrete implementation that actually talks to the dependency.

What does this give us? Well, it gives us a couple of things. First of all, it typically gives us a significant simplification of the interface between the application and the dependency; if you are storing documents, you typically end up with operations like “store document, load document, and get the list of documents”, and they have very simple parameter lists. That is quite a bit simpler than a file system, and an order of magnitude simpler than most databases. This makes writing the application-level code simpler, with all of the benefits that come with simpler code.

Second, it decouples the application code from the implementation; because we defined the interface at the application level, if we did it right there are no implementation-specific details at the app layer (okay, there is probably a factory somewhere with some details – root directory, connection string, that sort of thing). That gives us the things we like from a componentization perspective, and incidentally makes it straightforward to write a different implementation of the interface in some other technology.

At this point there is somebody holding up their hand and saying, “but how are you going to test the implementation of port to make sure it works?” BTW, Cockburn calls the implementation of a port an “adapter” because it adapts the application view to the underlying dependency view, and the overall pattern is therefore known as “port/adapter”.

This is a real concern. Cockburn came up with the pattern before TDD was really big so we didn’t think about testing in the same way, and he was happy with the tradeoff of a well-defined adapter that didn’t change very often and therefore didn’t need a lot of ongoing testing because the benefits of putting the “yucky dependency code” (my term, not his) in a separate place was so significant. But it is fair to point to that adapter code and say, “how do you know that the adapter code works?”

In the TDD world, we would like to do better. My first attempt did what I thought was the logical thing to do. I had an adapter that sat on top of the file system, so I put a façade on the file system, and wrote a bunch of adapter tests with a mocked-out file system, and verified that the adapter behaved as I expected it to. Which worked because the file system was practical to mock, but would not have worked with a database system because of the problem with mocking.

Then I read something that Arlo wrote about simulators, and it all made sense.

After I have created a port abstraction, I need some way of testing code that uses a specific port, which means some sort of test double. Instead of using a mocking library – which you already know that I don’t like – I can write a special kind of test double known as a simulator. A simulator is simply an in-memory implementation of the port, and it’s generally fairly quick to create because it doesn’t do a ton of things. Since I’m using TDD to write it, I will end up with both the simulator and a set of tests that verify that the simulator behaves properly. But these tests aren’t really simulator tests, they are port contract tests.

So, I can point them at other implementations of the port (ie the ones that use the real file system or the real database), and verify that the other adapters behave exactly the way the simulator does. And that removes the requirement to test the other adapters in the traditional unit-tested way; all I care about is that all the adapters behave the same way. And it actually gives me a stronger sense of correctness, because when I used the façade I had no assurance that the file system façade behaved the same way the real file system did.

In other words, the combination of the simulator + tests has given me a easy & quick way to write application tests, and it has given me a way to test the yucky adapter code. And it’s all unicorns and rainbows from then on. Because the simulator is a real adapter, it supports other uses; you can build a headless test version of the application that doesn’t need the real dependency to work. Or you can make some small changes to the simulator and use it as an in-memory cache that sits on to of the real adapter.

Using Port/Adapter/Simulator

If you want to use this pattern – and I highly recommend it – I have a few thoughts on how to make it work well.

The most common problem people run into is in the port definition; they end up with a port that is more complex than it needs to be or they expose implementation-specific details through the port.

The simplest way to get around this is to write from the inside out. Write the application code and the simulator & tests first, and then only go and write the other adapters when that is done. This makes it much easier to define an implementation-free port, and that will make your life easier far easier.

If you are refactoring into P/A/S, then the best approach gets a little more complex. You probably have application code that has implementation-specific details. I recommend that you approach it in small chunks, with a flow like this:

  1. Create an empty IDocumentStore port, an empty DocumentStoreSimulator class, and an empty DocumentStoreFileSystem class.
  2. Find an abstraction that would be useful to the application – something like “load a document”.
  3. Refactor the application code so that there is a static method that knows how to drive the current dependency to load a document.
  4. Move the static method into the file system adapter.
  5. Refactor it to an instance method.
  6. Add the method to IDocumentStore.
  7. Refactor the method so that the implementation-dependent details are hidden in the adapter.
  8. Write a simulator test for the method.
  9. Implement the method in the simulator.
  10. Repeat steps 2-9.

Practice

I wrote a few blog posts that talk about port/adapter/simulator and a practice kata. I highly recommend doing the kata to practice the pattern before you try it with live code; it is far easier to wrap your head around it in a constrained situation than in your actual product code.

Lean, Toyota, and how it relates to agile.

January 4, 2016 at 5:22 pm

I like to read non-software-development books from time to time, and during my holiday vacation, I read “The Toyota Way to Continuous Improvement: Linking Strategy and Operational Excellence to Achieve Superior Performance”.

What? Doesn’t everybody relax by reading books about different organizational approaches and business transformation during their holidays?

I highly recommend the book if you are interested in agile processes in the abstract; there’s an interesting perspective that I haven’t been seeing in the part of the agile world I’ve been paying attention to. 

I will note that there are a number of people who have explored and written about the overlap between lean and agile, so I don’t think I’m breaking ground here, but there are a few things that I think are worth sharing.

Value stream mapping

Part of my definition of “being agile” involves change; if you aren’t evolving your process on an ongoing basis, you aren’t agile. I’ve spent a lot of time looking at processes and can now look at a team’s process and have a pretty good idea where they are wasting time and what a better world might look like.

Unfortunately, that isn’t especially useful, because “Do what Eric says you should do” is not a particularly good approach. I don’t scale well, and I have been wrong on occasion.

Shocking, I know.

It also removes the tuning to a specific team’s needs, which is pretty important.

I do know how to teach the “determine an experiment to try during retrospective” approach, and have had decent luck with that, but teams tend to go for low-hanging fruit and tend to ignore big opportunities. Just to pick an example, you can experiment your way into a better process for a lot of things, but the project build that takes 20 minutes and the code review process that takes 8 hours is now dominating your inner loop, and those are the things that you should think about.

I don’t currently have a great way to make these things obvious to the team, a way to teach them how to see the things that I’m seeing. I’m also missing a good way to think about the process holistically, so that the big issues will at least be obvious.

Enter value stream mapping, which is a process diagram for whatever the team is doing. It includes the inputs to the team, the outputs from the team, and all of the individual steps that are taken to produce the output. It also typically includes the amount of time each operation takes, how long items sit in a queue between steps, whether there are rework steps, etc. Here’s a simple diagram from Net Objectives:

The times in the boxes are the average times for each step, and I think the times underneath the boxes are the worst cases. The times between are the average queue times. We also show some of the rework that we are spending time (wasting time) on.

Given all of this data, we can walk all the boxes, and figure out that our average time to implement (ignoring rework) is about 266 hours, or over 6 weeks. Worse, our queue time is just killing us; the average queue time is 1280 hours, or a full 8 *months*. So, on average, we can expect that a new request takes over 9 months to be deployed. We can then look at what the world would be like if we combined steps, reduced queue sizes, or reduced rework. It gives us a framework in which we can discuss process.

This is a simple example; I’m sure that the real-world flow also has a “bugfix” path, and there is likely a “high-priority request” section that is also at work.

I’m also interested in the details inside the boxes. We could decompose the Code box into separate steps:

  1. Get current source and build
  2. Write code
  3. Test code
  4. Submit code for code review
  5. Handle code review comments
  6. Submit code to gated checkin system

Each of these steps has queue times between them, there are likely rework loops for some of them, and the “test” step likely varies significantly based on what you are testing.

Coming up with a value stream mapping is typically the first thing you do with the Toyota approach. I like the fact that it’s a holistic process; you cover all the inputs and the outputs rather than focusing on the things you know about.

I have not tried doing this for a software team yet, but I find the approach very promising and hope to try it soon. I’m especially hoping that it will highlight the impact that big organizational systems have on team agility.

Implementing an improvement

The continuous improvement approach used by Toyota is know as PDCA (either plan-do-check-act or plan-do-check-adjust). Here’s a more detailed explanation of “plan”:

  1. Define the problem or objective
  2. Establish targets
  3. Understand the physics (use 5 whys to figure out what is really going on).
  4. Brainstorm alternatives
  5. Analyze and rank alternatives
  6. Evaluate impact on all areas (this is a “what could go wrong?” step)
  7. Create a visual, detailed, double-ended schedule

I like that it’s organized into steps and overall focuses on the “let’s think about this a bit”. That is good, and I especially like #3 and #6.

On the other hand, agile teams who aren’t used to making changes can easily get stuck in analysis paralysis and can’t actually agree on something to try, and a detailed set of steps could easily make that worse. I’m more concerned that they try *something* at the beginning rather than it be the absolute best thing to try.

So, I’m not sure about this one yet, but it is interesting.

Organic vs Mechanistic

In the book, the talk about two ways of implementing lean. The organic approach is close to the Toyota approach; you send a very experienced coach (sensei) into the group, and they teach the group how to do continuous improvement and stick around for a while. This gives great results, but requires a lot of work to teach the team members and reinforcement to make sure the whole team understands that continuous improvement is their job. It is also sensitive to the environment the group is embedded inside; some groups had made the transition but it didn’t take because management didn’t understand the environment it took to foster the improvement in the first place.

I’m sure that seems familiar to many of you trying to do agile.

The mechanistic approach comes from the Six Sigma folks. It focuses more on top-down implementation; you train up a centralized group of people and they go out across the company to hold Kaizen (improvement) events. That gives you breadth and consistency across the organization – which are good things – but the results aren’t as significant and – more importantly – teams do not keep improving on their own.

As you might have figured out, I’m a big fan of the organic approach, as I see it as the only way to get the ongoing continuous improvement that will take you someplace really great – the only way that you will get a radically more productive team. And I’ve seen a lot of “scrum from above” implementations, and – at best – the have not been significant successes. So, I’m biased.

Interestingly, the book has a case study of a company that owned two shipyards. One took an organic approach and the other took the mechanistic approach. The organic approach worked great where it was tried, but it was difficult to spread across that shipyard without support and it was very easy for the group to lose the environment that they needed to do continuous improvement.

The mechanistic shipyard had not seen the sort of improvements that the organic one saw, but because they had an establish program with executive sponsorship, the improvements were spread more broadly in the enterprise and stuck around a bit better.

The consultants said that after 5 years is was not clear which shipyard had benefitted more. Which I find to be very interesting in how you can do something organic but it’s really dependent on the individuals, and to make something lasting you need the support of the larger organization.

The role of employees

In the Toyota world, everybody works on continuous improvement, and there is an expectation that every employee should be able to provide an answer around what the current issues are in their group and how that employee is helping make things better.

That is something that is really missing in the software world, and I’m curious what sort of improvements you would see if everybody knew it was their job to make things better on an ongoing basis.

The role of management

One of the interesting questions about agile transitions is how the role of management evolves, and there are a lot of different approaches taken. We see everything from “business as usual” to people managers with lots of reports (say, 20-50) to approaches that don’t have management in the traditional sense.

I’m a big believer in collaborative self-organizing approaches, and the best that I’m usually hoping for is what I’d label was “benign neglect”. Since that is rare, I hadn’t spent much time thinking about what optimal management might be.

I think I may now have a partial answer to this. Toyota lives and breathes continuous improvement, and one of the most important skills in management is the ability to teach their system at that level. They have employees whose only role is to help groups with continuous improvement. I think the agile analog is roughly, “what would it be like if your management was made up of skilled agile coaches who mostly focused on helping you be better?”

Sounds like a very interesting world to be in – though I’m not sure it’s practical for most software companies.  I do think that having management focusing on the continuous improvement part – if done well – could be a significant value-add for a team.

Response to comments : You suck at TDD #3–Design sensitivity and improvement

December 18, 2015 at 8:58 am

I got some great comments on the post, and I answered a few in comments but one started to get very long-winded so I decided to convert my response into a post.

Integration tests before refactoring

The first question is around whether it would be a good idea to write an integration test around code before refactoring.

I hate integration tests. It may be the kinds of teams that I’ve been on, but in the majority of cases, they:

  1. Were very expensive to write and maintain
  2. Took a long time to run
  3. Broke often for hard-to-determine reasons (sometimes randomly)
  4. Didn’t provide useful coverage for the underlying features.

 

Typically, the number of issues they found was not worth the amount of time we spent waiting for them to run, much less the cost of creating and maintaining them.

There are a few cases where I think integration tests are justified:

  1. If you are doing something like ATDD or BDD, you are probably writing integration tests. I generally like those, though it’s possible they could get out of hand as well.
  2. You have a need to maintain conformance to a specification or to previous behavior. You probably need integration tests for this, and you’re just going to have to pay the tax to create and maintain them.
  3. You are working in code that is scary.

 

"Scary" means something very specific to me. It’s not about the risk of breaking something during modification, it’s about the risk of breaking something in a way that isn’t immediately obvious.

There are times when the risk is significant and I do write some sort of pinning tests, but in most cases the risk does not justify the investment. I am willing to put up with a few hiccups along the way if it avoids a whole lot of time spent writing tests.

I’ll also note that to get the benefit out of these tests, I have to cover all the test cases that are important. The kinds of things that I might break during refactoring are the same kind of things I might forget to test. Doing well at this makes a test even more expensive.

In the case in the post, the code is pretty simple and it seemed unlikely that we could break it in non-obvious way, so I didn’t invest the time in an integration test, which in this case would have been really expensive to write.  And the majority of the changes were done automatically using Resharper refactorings that I trust to be correct.

Preserving the interface while making it testable

This is a very interesting question. Is it important to preserve the class interface when making a class testable, or should you feel free to change it? In this case, the question is whether I should pull the creation of the LegacyService instance out of the method and pass it in through the constructor, or instead use another technique that would allow me to create either a production or test instance as necessary.

Let me relate a story…

A few years ago, I led a team that was responsible for taking an authoring tool and extending it. The initial part had been done fairly quickly and wasn’t very well designed, and it had only a handful of tests.

One day, I was looking at a class, trying to figure out how it worked, because the constructor parameters didn’t seem sufficient to do what it needed to do. So, I started digging and exploring, and I found that it was using a global reference to an uber-singleton that give it access to 5 other global singletons, and it was using these singletons to get its work done. Think of it as hand-coded DI.

I felt betrayed and outraged. The constructor had *lied* to me, it wasn’t honest about its dependencies.

And that started a time-consuming refactoring where I pulled out all the references to the singletons and converted them to parameters. Once I got there, I could now see how the classes really worked and figure out how to simplify them.

I prefer my code to be honest. In fact, I *need* it to be honest. Remember that my premise is that in TDD, the difficulty of writing tests exerts design pressure and that in response to that design pressure, I will refactor the code to be easier to test and that aligns well with "better overall". So I am hugely in preference of code that makes dependencies explicit, both because it is more honest (and therefore easier to reason about), and because it’s messy and ugly and that means I’m more likely to convert it to something that is less messy and ugly.

Or, to put it another way, preserving interfaces is a non-goal for me. I prefer honest messiness over fake tidiness.