You Suck at TDD #7 – Improvements Phase 2

March 27, 2016 at 6:20 pm

All changes from this post are in the Improvements Phase 2 repo

Commits that start with “R:” are done through Reshaper. I’ll cluster them together this time.

R: Start of Phase 2

Last time, I created the EmployeeFilter abstraction and populated it with code from the GetEmployees() method. That was a pretty typical refactoring to deal with primitive obsession. But, I don’t think I’m quite done with that particular thought, as there’s another bit of primitive obsession hanging around in the code. This kind is arguably more common than the first type, especially in the Linq/IEnumerable world…

List<Employee> result = new List<Employee>();

Employee is a domain type, but List<T> is one of those computer-sciency types that I don’t like. We end up with little bits of code that do things to the collection/enumerable scattered all over the codebase, and it makes our testing and cohesion a lot harder.

If your code has helper methods that take a List<T> or IEnumerable<T>, you probably have this. If you have extension methods that take these types, then you very likely have this. Or, if you have code that does foreach or Linq operations on the list in our code, you also have this. Fixing this can have a big impact on testability and maintainability.

So, let’s see if we can morph this particular instance into something that’s a little better.

I’ll start by extracting the creation of the list into a separate method named CreateEmployeeCollection(), and extract the new method into a new EmployeeCollection class. This new class will be used instead of the raw list.

Now, I’m going to convert this method to an instance method. This takes two separate operations; first I need to add a parameter that will become the “this” parameter, and then I can convert it to a non-static method. When I add the parameter, I can specify that I want it to create a new instance of the class. Then I just use the “make it non-static” refactoring.

R: Pull result creation into a separate method
R: Extract class EmployeeCollection
R: Changed CreateEmployeeCollection signature to take an EmployeeCollection parameter
R: Make it non-static

I should probably note that it took me a couple of times to get this next series to work the way that I wanted it to work, so there are a couple of reverts that I did before starting forward again.

The first thing I need to do is to take the local variable and make it a field. When I do this, I’ll initialize it in the constructor.

R: Introduce field to hold the collection

 And now I’m going to encapsulate it into a field. When I do this, I choose to update all of the usages to use the property. This choice is important.

R: Encapsulate the list of employees into a property

I’m almost done with this set of transformations. I now inline the CreateEmployeeCollection() method, which gives us a more normal usage in the caller:

var result = new EmployeeCollection().Items;

R: Inline create method back into the caller…

Still a bit weird, but there’s a way to fix that. I’ll start by extracting out the collection to a variable,which gives me this:

EmployeeCollection employeeCollection = new EmployeeCollection();
var result = employeeCollection.Items

and then I can inline result. That leaves us using the employee collection instead of the result.

R: Introduce a variable for the new type
R: Inline the result variable

All that I have left of this transformation is changing the return type of the method so that it returns the collection rather than the items. I used to just do this by hand, but I figured out a better way to do it. It’s another variant of the “extract method / inline method” approach. Basically, I’m going to sequester the “.Items” part of the return statement and shove it back to the caller.

I’ll extract the collection from the return statement into a new variable. Then I’ll extract a new method that does everything except the last part of the return type, inline the old method, and rename the method to get back to where we started. Look at the commits if you want the details:

R: Create a variable for the new return type
R: Extract method except for the last return statement
R: Inline the original method
R: Rename GetEmployees2() to GetEmployees
R: One inline to clean things up

At this point, the EmployeeCollection is fully-formed. You may not be impressed by the simplification in this sample code – and that’s a fair criticism – but this is a very simple bit of code. In most cases there are a lot of places where callers are using this data that I can fold into the EmployeeCollection class.

And… I don’t think I’m quite done with the class yet. But that will have to wait, as it’s time to write some tests. There really isn’t a lot going on in the collection right now, so I only added a single test.

Onto the meat of the problem…

Here’s a bit of code that I’ve been wrestling with since I wrote the original code. The problem is that the filtering code is intertwined into the database code. That’s the thing I’m going to address next:

while (reader.Read())

                {
                    int id = reader.GetInt32(EmployeeIdColumnIndex);
                    string name = reader.GetString(EmployeeNameColumnIndex);
                    int age = reader.GetInt32(EmployeeAgeColumnIndex);
                    bool isSalaried = reader.GetBoolean(EmployeeIsSalariedColumnIndex);

                    if (employeeFilter.Matches(name, age, isSalaried))
                    {
                        employeeCollection.Items.Add(new Employee
                        {
                            Name = name,
                            Id = id,
                            Age = age,
                            IsSalaried = isSalaried
                        });
                    }
                }

I started by pulling the creation of the employee instance out into a separate variable. And then, I decided that was the wrong thing, so I reverted it. Instead, I target the “match and add” code, and did the usual set of transformations to move it to the collection class:

R: Extract filter and add to a new method
R: Move the method to the EmployeeCollection class
R: Convert to instance method

This is looking better, but the signature for the function is like this:

public void AddEmployeeIfMatch(EmployeeFilter employeeFilter, string name, int age, bool isSalaried, int id)

It doesn’t make a lot of sense to pass in the individual items to the filter; we should pass in an employee. So *now* I want to pull the creation of the employee out of the method.

R: Extract the Employee creation out of the method.

And extract it out into a variable…

R: Extract employee creation into a variable.

That leaves me with the following code;

        public void AddEmployeeIfMatch(EmployeeFilter employeeFilter, string name, int age, bool isSalaried, int id, Employee employee)
        {
            if (employeeFilter.Matches(name, age, isSalaried))
            {
                Items.Add(employee);
            }
        }

Now, I’d like to get rid of those individual variables and replace them with the employee instance. I want to do that at the filter level, and I’m going to use a little trick to get most of the way there. Basically, I’m going to create another class that is like that is the existing Employee class, and then merge them together.

On EmployeeFilter.Match, I do the “create class from parameters” refactoring, and create a nested Employee class. This looks pretty much like the employee class I’ve already created, except it’s missing one of the fields.

R: Extract new Employee class

I’m left with this line:

            if (employeeFilter.Matches(new EmployeeFilter.Employee(name, age, isSalaried)))

and I’m just going to hand-edit it to say what I want:

            if (employeeFilter.Matches(employee))

and then delete the new extracted Employee class, and clean up the tests.

I’m not really happy with the mechanics of this refactoring, but I’m not smart enough to figure out something better.

Merge together usages of both Employee classes

Now I can delete the parameters I don’t need any more

R: Delete no-longer-used parameters

Now that I’ve written some code, I want to backfill with tests. I pushed code into EmployeeCollection, so I’ll start there. Here’s an example of one of the tests that I wrote:

        [TestMethod]
        public void When_I_create_an_EmployeeCollection_and_add_an_employee_that_does_match__its_in_the_items_collection()
        {
            EmployeeCollection employeeCollection = new EmployeeCollection();

            EmployeeFilter employeeFilter = new EmployeeFilter(EmployeeFilterType.ByName, "X");

            employeeCollection.AddEmployeeIfMatch(employeeFilter, new Employee() { Name = "Xavier" });

            Assert.AreEqual(1, employeeCollection.Items.Count);
            Assert.AreEqual("Xavier", employeeCollection.Items.First().Name);
        }

Since these posts are supposedly about TDD, let’s look at the feedback that the tests are giving us.

My preference is for tests that are very simple. This test isn’t bad, but I don’t like the fact that we require an EmployeeFilter instance to test whether the EmployeeCollection class works correctly. They really shouldn’t be coupled together.

My traditional way of dealing with this is to define an interface – let’s call it IEmployeeFilter – and then define a few test versions. Something like EmployeeFilterTrue and EmployeeFilterFalse. That’s not a bad choice, but I had a different thought.

What if I went more functional? Instead of describing the matching function as implementing a specific interface, what if I described it as having a specific shape?

public void AddEmployeeIfMatch(Func<Employee, bool> employeeFilter, Employee employee)

That decouples the two classes, and now the tests do not need to create an EmployeeFilter instance. This design takes a little bit of getting used to, but the benefits of coupling through something other than a class or interface are significant.

Let me say that again because I think it’s really important:

Taking a more abstract/functional approach to how classes coordinate with each other can have significant impact on how testable and maintainable your code is because it eliminates dependencies rather than just abstracting them. 

And, we also gained a new functionality; you can write a custom filter inline without having to modify the EmployeeFilter class, since you can just write an inline lambda filter.

Use Func<Employee, bool> as parameter

The code is getting better, but I think the filtering part is still a little bit weird. You can only filter as part of adding an item to a collection, which means that filtering must be coupled to fetching. This has been bothering me since I first started on the code. I’ve been thinking about how to do it with automatic refactoring, but haven’t figured out how, so I’m just going to do it by hand.

I do it by introducing a new collection, so the fetching into the collection and the filtering act as two separate operations.

Here’s the new code:

                while (reader.Read())
                {
                    int id = reader.GetInt32(EmployeeIdColumnIndex);
                    string name = reader.GetString(EmployeeNameColumnIndex);
                    int age = reader.GetInt32(EmployeeAgeColumnIndex);
                    bool isSalaried = reader.GetBoolean(EmployeeIsSalariedColumnIndex);

                    var employee = new Employee
                    {
                        Name = name,
                        Id = id,
                        Age = age,
                        IsSalaried = isSalaried
                    };
                    employeeCollection.Items.Add(employee);
                }
            }

            EmployeeCollection filteredEmployees = new EmployeeCollection();
            foreach (Employee employee in employeeCollection.Items)
            {
                filteredEmployees.AddEmployeeIfMatch(employeeFilter.Matches, employee);
            }

            return filteredEmployees;

Now we have a two separate chunks of code. Let’s see what we can do with it. First, let’s move the Items.Add() into the collection:

R: Extract AddEmployee method
R: Move to EmployeeCollection class
R: Make it an instance method
R: Rename to Add()

Hmm. There’s no reason that the GetEmployees() method needs to iterate over the collection and call the filter itself. Let’s encapsulate and move it.

R: Extract to Filter method
R: Move to EmployeeCollection class
R: Make it an instance method
R: inline filtered collection

That is nicer, though I need to hoist the function out.

R: Inline add method
R: Make filter a parameter
R: Remove unused parameter

At this point, our approach is very Linq-like. We do need to do some cleanup on the tests. I changed them to use the new Filter() method and removed the old method.

Refactor tests, remove AddWithFilter()

That was a lot of individual steps, and there’s probably about 30 minutes of work there real-time. I’m reasonably happy with the result, though there are some further opportunities for simplification in the collection classes.

 

 

You Suck at TDD #6 – Improvements Phase 1

March 15, 2016 at 3:13 pm

Welcome to the first post on improving the yucky code. I have a few points I’d like to cover briefly, and then we’ll dive into the code.

First off, I’m going to end up refactoring to a specific endpoint, but that is not necessarily the only endpoint or even the best endpoint. Not only are there different opinions on whether one design is better or not, I have found that my own opinions have evolved over time.

Second, I’d like to talk about a meta question. Unless we are working with clean code, there are generally a number of things “wrong” with a given piece of code, which means a decision needs to be made of which issue to address first. I’ve noticed that many developers try to address the biggest issue first. If you go down this road, you will often find that what looked like a simple change balloons into a change involving multiple changes in 30 different files. You are fighting against the complexity, and the big problems often have solutions that are harder to find. This increases the  cost of the whole refactoring – it takes lots of time, it’s disruptive to other people working in the code – and this can give refactoring a bad name on your team.

Instead, start with the smallest problem that you find. This should be easy and quick to fix, which means you will make good progress. Do this a few times, and sometimes the big problem becomes more tractable.

I also am a firm believer that refactoring has to be done in small, incremental steps. To make it easier to follow, I have checked in after every atomic change, and in reality I will probably bunch a few changes together and commit chunks that are a bit bigger, but I still prefer to work in the small, and I find that my commit size has gotten smaller over the years. Some engineering environments do not permit small checkins, and if you’re in one of those, I suggest that you still try to work in very small tests.

Pairing is a wonderful way to do refactoring; pairs attack more issues and solve them better than individuals. You may also be able to use the continuous code review part of pairing to meet your code review requirement so that you can check in often.

My final bit of advice is that if you are lucky to work in languages with good refactoring tools, learn the refactorings that your tool offers, including the keyboard shortcuts. This makes a significant difference in the speed at which you can operate.

I hope you’ve spent a bit of time studying the code here. I’ve created a separate branch, and I’m committing along the way if you’d like to follow on. If the change in a commit was done automatically – in this case with Resharper – it is prefixed with a “R:”.

I’d like to start making this code better. To refresh your memory, here’s the method that we are dealing with. Take a quick look at it:

        public IEnumerable<Employee> GetEmployees(EmployeeFilterType employeeFilterType, string filter, FakeSqlConnection connection)
        {
            if (employeeFilterType == EmployeeFilterType.ByName && filter == null)
            {
                throw new ArgumentNullException("filter");
            }

            string query = "select * from employee, employee_role inner join employee.Id == employee_role.EmployeeId";

            List<Employee> result = new List<Employee>();
            using (FakeSqlCommand sqlCommand = new FakeSqlCommand(query, connection))
            {
                FakeSqlDataReader reader;
                int retryCount = 5;

                while (true)
                {
                    try
                    {
                        reader = sqlCommand.ExecuteReader();
                        break;
                    }
                    catch (Exception)
                    {
                        if (retryCount-- == 0) throw;
                    }
                }

                while (reader.Read())
                {
                    int id = reader.GetInt32(EmployeeIdColumnIndex);
                    string name = reader.GetString(EmployeeNameColumnIndex);
                    int age = reader.GetInt32(EmployeeAgeColumnIndex);
                    bool isSalaried = reader.GetBoolean(EmployeeIsSalariedColumnIndex);

                    switch (employeeFilterType)
                    {
                        case EmployeeFilterType.ByName:
                            if (!name.StartsWith(filter)) continue;
                            break;
                        case EmployeeFilterType.ExemptOnly:
                            if (age < 40 || !isSalaried) continue;
                            break;
                    }

                    result.Add(new Employee {Name = name, Id = id, Age = age, IsSalaried = isSalaried});
                }
            }

            return result;
        }

The simple thing that I’m going to attack is something that I see all over the place in code, a little code smell that is called “Primitive Obsession”.

Simply speaking, primitive obsession is using a computer-sciency construct – such as a integer, a string, or a generic List – instead of a domain construct. When we do this, we generally end up with code specific to the construct scattered all over the codebase because there is no place for such code to cluster.

Are there any examples of primitive obsession in our code? Anyone? Bueller?

The employee filtering approach jumped out at me; we have both validity checking and the filter implementation in our method. After each description, I’ll list the commit label so you can see what changed. The first thing I need to do is to create an abstraction for filter. Resharper has a cool refactoring that will take a set of parameters, and wrap them in a class, so that’s what I used.

R: Extract filter class from parameters

The next thing I want to do is start moving the code related to the filter class from the yucky class to the new EmployeeFilter class. I’ll start with the validation code; first I’ll extract it into a method, and then I’ll move the method into the EmployeeFilter class.

R: Extract Validate Method
R: Move Validate() to filter class

I’ll do the same thing with the filtering code in the middle of the routine

R: Extract DoFilter() method
R: Move DoFilter() to EmployeeFilter

Now that I’ve done that, it’s time to push the new class into a separate file.

R: Move type to another file

The two methods that I used are both static methods which take the EmployeeFilter class as a parameter. I’ll make them instance methods.

Incidentally, the pattern of pulling out primitive types and creating domain types is known as Whole Value.

R: Make methods non-static

Looking at the code, I have the following statement:

                    if (employeeFilter.DoFilter(name, age, isSalaried)) continue;

That’s a little weird; in that the true return means the filter didn’t match. I couldn’t find a way to do this with a refactoring, so I did it by hand.

Invert filter boolean

Let’s look a bit more at the EmployeeFilter class. I note that it has properties for the Filter and EmployeeFilterType, but with the refactoring neither of those are used outside of the class any more. Resharper will let me inline them.

R: Inline Filter and EmployeeFilterType properties

The class is looking fairly good now, but there’s still one thing that is bothering me – that call that I make to Validate() seems extraneous. Why should the GetEmployees() be responsible for that? Why can’t EmployeeFilter guarantee that it is valid?

I’ll start by extracting out the part of the method that I want to keep:

R: Extract the part of GetEmployees() that doesn’t have the validate.

Now, I can inline the original method and rename the new one back to GetEmployees().

R: Inline method and rename

That has pulled the code out of the GetEmployees() class, but I want it in the EmployeeFilter class. I’ll extract it into a factory method, and then move that to the EmployeeFilter class.

R: Extract creation into factor method
R: Move factory to EmployeeFilter class
R: Extract parameters out of the factory

At this point, what I would really like is a “Convert factory to constructor” refactoring, but it does not exist. If anybody has an idea how to do this automatically, let me know.

Move validate method to constructor

And now I can inline the factory to get rid of it…

R: Inline factory method

I don’t really need a separate validation method, so I’ll inline that as well.

R: Inline validate method

At this point, there’s a brief little side issue I’d like to discuss… When I moved the validation to the EmployeeFilter class, I lost the null check in the GetEmployees() method. I’m not a huge fan of adding null checks just to guard against developer errors if the code isn’t a library being used outside the team; I don’t think the cost is worth the benefit.

But if you want that functionality, feel free to add the check in.

Now that we have a nice type, it’s missing something. Tests. So, I went off and wrote a few. They are the next series of commits:

Validation test
Added test for failed name filter
Added test for passing named filter
Added exempt test for too young
Added test for old enough but not salaried
Added passing test for exempt

So, what did I accomplish? Well, the filter code now has a separate abstraction, and the GetEmployees() method does look a little bit better. The refactoring took me about 30 minutes, though the writing took a fair bit more. The vast majority of the work was done with automatic refactorings.

There is still a lot to be done, but that is all for this time.

You Suck at TDD #5 – Homework

March 10, 2016 at 4:17 pm

It is now time for me to turn the tables on you, to present you with some homework. If you take a trip to my Github account, you will find the following lovingly crafted repo.

In it, you will find some yucky code. It’s not terrible code – in fact, it’s pretty average code from my experience – but it does have some issues.

Your task is the following:

  1. Look at the code and figure out what issues there are. I suggest writing them down.
  2. Refactor the code into something that is testable, which is to be proven in the obvious way.

While you do this, I recommend that you try to a) do as much of it using refactoring support (either built into VS or using Resharper), and b) do as much of it with keyboard shortcuts as possible.

In each of the next few posts, I’ll pick a specific issue that I see and take the code through a transformation to make it better.

“The Goal” and “The Phoenix Project”

March 4, 2016 at 9:05 pm

I have been spending some time looking at how to apply some of the lessons of lean to software development team, and part of that has been catching up on some reading.

One of the classics is “The Goal” by Eliyahu Goldratt. In this book, he introduces his theory of constraints and bottlenecks, which is all about looking at process from a high level, figuring out where the main problems are, and addressing those. There is a ton of useful information there and I think it applies well to the software world.

The book is very approachable because it’s written as a novel; the main character works at a manufacturing plant. It’s the “can’t-put-it-down-business-hit-of-the-week”.

If you want something that is a bit more computer related, you could choose to read “The Phoenix Project”, which is like The Goal, but the story is about IT instead of manufacturing. It is also a novel. I think it has most of the goodness of “The Goal”, but it is missing the more formal definition of how to deal with constraints. Perhaps start with the Phoenix Project and read “The Goal” if you want more.

Both are recommended.

What will you learn from these? Well, one of the themes of both books is that the process only works if you do the discovery yourself, so I’m not sure I should tell you the whole story, but here’s the basic overview.

The big factor that controls how much you can do is how quickly you can move work items through the system. Only by identifying the bottlenecks in the system and working to increase their throughput will you increase the overall throughput of the system; if you optimize the steps that are not bottlenecks, the output will not improve. Interestingly, you may need to de-optimize the non-bottlenecks to improve the overall throughput and/or latency of the system.