You Suck at TDD #8 – Doing fewer things

July 5, 2016 at 3:05 pm

Welcome back to You Suck at TDD. Today’s code will show up in the Improvements-Phase-3 branch if you would like to follow along.

In our last episode, we concentrated mostly on the Employee fetching and filtering. Things are better, but we still have a problem…

Well, actually, we have a number of problems, but we’ll start with the first one that I see, which is that Yucky.GetEmployees() is trying to do too many things – it both fetches data and filters it.

This is one of the most common things I see when I look at code – methods that try to do too much, and they are generally quite challenging to test because they do too much.

EmployeeSource class

So, we’ll start by working on that. I pulled the fetching code out into a new EmployeeSource class and then I pulled the creation of the EmployeeSource instance out of GetEmployees().

At this point, GetEmployees() only has a couple of lines in it:

public static EmployeeCollection GetEmployees(EmployeeFilter employeeFilter, EmployeeSource employeeSource)
{
    var employeeCollection = employeeSource.FetchEmployees();

    return employeeCollection.Filter(employeeFilter.Matches);
}

We can easily simplify it with an inline of employeeCollection:

return employeeSource.FetchEmployees().Filter(employeeFilter.Matches);

At this point, there is no reason to have a separate method, so I inlined FetchEmployees() and did some cleanup in Main(), and deleted class Yucky.

This does make Main() more complex than I would like it, but I did the inline with a purpose; I find that my thoughts are often constrained by the abstractions that are present, and the act of getting rid of a bad abstraction sometimes makes it easier to find the right abstraction.

Spending a little time with Main(), it isn’t testable code, and I’d like to get some tests in there…

If we abstract out writing the output to the console, we get something like this:

var collection = employeeSource.FetchEmployees().Filter(employeeFilter.Matches);

WriteToConsole(collection);

That’s decent code; since it doesn’t have any conditionals, it’s likely code that either works all the time or never works at all. Not my first target when I’m looking to make things more testable, but if we wanted to cover it, how can we do it?

Well, the most obvious thing to do is to extract the IEmployeeSource interface, create a method that calls that, create a mock (or a simulator using P/A/S), and then write a test or two. That adds some complexity; you have another interface hanging around, and you need to create and maintain the mock and simulator. It will definitely work, but it is a fairly heavyweight approach.

I’m instead going to go with a different option. One of the things I’ve noticed with developers is that while they may pick up on the opportunity to create a data abstraction – or not, given the title of this series – they rarely pick up on the opportunity to create an algorithmic abstraction.

So, I’m going to go in that direction.

From an abstract perspective, this code does the following:

  • Creates a chunk of data
  • Performs an operation on that chunk of data to create a second chunk of data
  • Consumes that chunk of data

Our current implementation implements this through procedural statements, but we could consider doing this differently. Consider the following code:

public class Pipeline
{
    public static void Process<T1, T2>(Func<T1> source, Func<T1, T2> processor, Action<T2> sink )
    {
        sink(processor(source()));
    }
}

This is an abstraction of the pattern that we were using; it’s general-purpose code that takes a source, processor, and sink, and knows how to wire them up to pass data between them. The version of Process() that I wrote handles this specific case, but you can obviously write versions that have more than one processor.

Doing that with our code results in the following:

Pipeline.Process(
    employeeSource.FetchEmployees,
    (employees) => employees.Filter(employeeFilter.Matches),
    WriteToConsole);

I like the first and third arguments, as they are just method names. I would like the second line also to be a method name.

We can recast this by adding a Filter() method to EmployeeFilter, which then yields:

Pipeline.Process(
    employeeSource.FetchEmployees,
    employeeFilter.Filter,
    WriteToConsole);

That makes me fairly happy; Pipeline.Process() is tested, and the code to use pipeline is almost declarative.

Take a few minutes and look through the current implementation, and write down what you like and what you don’t like.

Looking at the code, it is improved, but there are still a number of things that I don’t really like:

  • I don’t have a way to test my EmployeeSource class, so there is probably a trip to P/A/S coming up.
  • I’m not really in love with Pipeline; I have an alternate approach in mind that I think will be better.
  • The implementation of EmployeeSource has that ugly retry logic in it; that is something algorithmic that I hope can be teased out of the fetching.

Those will come up next time.

 

You Suck at TDD #6 – Improvements Phase 1

March 15, 2016 at 3:13 pm

Welcome to the first post on improving the yucky code. I have a few points I’d like to cover briefly, and then we’ll dive into the code.

First off, I’m going to end up refactoring to a specific endpoint, but that is not necessarily the only endpoint or even the best endpoint. Not only are there different opinions on whether one design is better or not, I have found that my own opinions have evolved over time.

Second, I’d like to talk about a meta question. Unless we are working with clean code, there are generally a number of things “wrong” with a given piece of code, which means a decision needs to be made of which issue to address first. I’ve noticed that many developers try to address the biggest issue first. If you go down this road, you will often find that what looked like a simple change balloons into a change involving multiple changes in 30 different files. You are fighting against the complexity, and the big problems often have solutions that are harder to find. This increases the  cost of the whole refactoring – it takes lots of time, it’s disruptive to other people working in the code – and this can give refactoring a bad name on your team.

Instead, start with the smallest problem that you find. This should be easy and quick to fix, which means you will make good progress. Do this a few times, and sometimes the big problem becomes more tractable.

I also am a firm believer that refactoring has to be done in small, incremental steps. To make it easier to follow, I have checked in after every atomic change, and in reality I will probably bunch a few changes together and commit chunks that are a bit bigger, but I still prefer to work in the small, and I find that my commit size has gotten smaller over the years. Some engineering environments do not permit small checkins, and if you’re in one of those, I suggest that you still try to work in very small tests.

Pairing is a wonderful way to do refactoring; pairs attack more issues and solve them better than individuals. You may also be able to use the continuous code review part of pairing to meet your code review requirement so that you can check in often.

My final bit of advice is that if you are lucky to work in languages with good refactoring tools, learn the refactorings that your tool offers, including the keyboard shortcuts. This makes a significant difference in the speed at which you can operate.

I hope you’ve spent a bit of time studying the code here. I’ve created a separate branch, and I’m committing along the way if you’d like to follow on. If the change in a commit was done automatically – in this case with Resharper – it is prefixed with a “R:”.

I’d like to start making this code better. To refresh your memory, here’s the method that we are dealing with. Take a quick look at it:

        public IEnumerable<Employee> GetEmployees(EmployeeFilterType employeeFilterType, string filter, FakeSqlConnection connection)
        {
            if (employeeFilterType == EmployeeFilterType.ByName && filter == null)
            {
                throw new ArgumentNullException("filter");
            }

            string query = "select * from employee, employee_role inner join employee.Id == employee_role.EmployeeId";

            List<Employee> result = new List<Employee>();
            using (FakeSqlCommand sqlCommand = new FakeSqlCommand(query, connection))
            {
                FakeSqlDataReader reader;
                int retryCount = 5;

                while (true)
                {
                    try
                    {
                        reader = sqlCommand.ExecuteReader();
                        break;
                    }
                    catch (Exception)
                    {
                        if (retryCount-- == 0) throw;
                    }
                }

                while (reader.Read())
                {
                    int id = reader.GetInt32(EmployeeIdColumnIndex);
                    string name = reader.GetString(EmployeeNameColumnIndex);
                    int age = reader.GetInt32(EmployeeAgeColumnIndex);
                    bool isSalaried = reader.GetBoolean(EmployeeIsSalariedColumnIndex);

                    switch (employeeFilterType)
                    {
                        case EmployeeFilterType.ByName:
                            if (!name.StartsWith(filter)) continue;
                            break;
                        case EmployeeFilterType.ExemptOnly:
                            if (age < 40 || !isSalaried) continue;
                            break;
                    }

                    result.Add(new Employee {Name = name, Id = id, Age = age, IsSalaried = isSalaried});
                }
            }

            return result;
        }

The simple thing that I’m going to attack is something that I see all over the place in code, a little code smell that is called “Primitive Obsession”.

Simply speaking, primitive obsession is using a computer-sciency construct – such as a integer, a string, or a generic List – instead of a domain construct. When we do this, we generally end up with code specific to the construct scattered all over the codebase because there is no place for such code to cluster.

Are there any examples of primitive obsession in our code? Anyone? Bueller?

The employee filtering approach jumped out at me; we have both validity checking and the filter implementation in our method. After each description, I’ll list the commit label so you can see what changed. The first thing I need to do is to create an abstraction for filter. Resharper has a cool refactoring that will take a set of parameters, and wrap them in a class, so that’s what I used.

R: Extract filter class from parameters

The next thing I want to do is start moving the code related to the filter class from the yucky class to the new EmployeeFilter class. I’ll start with the validation code; first I’ll extract it into a method, and then I’ll move the method into the EmployeeFilter class.

R: Extract Validate Method
R: Move Validate() to filter class

I’ll do the same thing with the filtering code in the middle of the routine

R: Extract DoFilter() method
R: Move DoFilter() to EmployeeFilter

Now that I’ve done that, it’s time to push the new class into a separate file.

R: Move type to another file

The two methods that I used are both static methods which take the EmployeeFilter class as a parameter. I’ll make them instance methods.

Incidentally, the pattern of pulling out primitive types and creating domain types is known as Whole Value.

R: Make methods non-static

Looking at the code, I have the following statement:

                    if (employeeFilter.DoFilter(name, age, isSalaried)) continue;

That’s a little weird; in that the true return means the filter didn’t match. I couldn’t find a way to do this with a refactoring, so I did it by hand.

Invert filter boolean

Let’s look a bit more at the EmployeeFilter class. I note that it has properties for the Filter and EmployeeFilterType, but with the refactoring neither of those are used outside of the class any more. Resharper will let me inline them.

R: Inline Filter and EmployeeFilterType properties

The class is looking fairly good now, but there’s still one thing that is bothering me – that call that I make to Validate() seems extraneous. Why should the GetEmployees() be responsible for that? Why can’t EmployeeFilter guarantee that it is valid?

I’ll start by extracting out the part of the method that I want to keep:

R: Extract the part of GetEmployees() that doesn’t have the validate.

Now, I can inline the original method and rename the new one back to GetEmployees().

R: Inline method and rename

That has pulled the code out of the GetEmployees() class, but I want it in the EmployeeFilter class. I’ll extract it into a factory method, and then move that to the EmployeeFilter class.

R: Extract creation into factor method
R: Move factory to EmployeeFilter class
R: Extract parameters out of the factory

At this point, what I would really like is a “Convert factory to constructor” refactoring, but it does not exist. If anybody has an idea how to do this automatically, let me know.

Move validate method to constructor

And now I can inline the factory to get rid of it…

R: Inline factory method

I don’t really need a separate validation method, so I’ll inline that as well.

R: Inline validate method

At this point, there’s a brief little side issue I’d like to discuss… When I moved the validation to the EmployeeFilter class, I lost the null check in the GetEmployees() method. I’m not a huge fan of adding null checks just to guard against developer errors if the code isn’t a library being used outside the team; I don’t think the cost is worth the benefit.

But if you want that functionality, feel free to add the check in.

Now that we have a nice type, it’s missing something. Tests. So, I went off and wrote a few. They are the next series of commits:

Validation test
Added test for failed name filter
Added test for passing named filter
Added exempt test for too young
Added test for old enough but not salaried
Added passing test for exempt

So, what did I accomplish? Well, the filter code now has a separate abstraction, and the GetEmployees() method does look a little bit better. The refactoring took me about 30 minutes, though the writing took a fair bit more. The vast majority of the work was done with automatic refactorings.

There is still a lot to be done, but that is all for this time.

You Suck at TDD #5 – Homework

March 10, 2016 at 4:17 pm

It is now time for me to turn the tables on you, to present you with some homework. If you take a trip to my Github account, you will find the following lovingly crafted repo.

In it, you will find some yucky code. It’s not terrible code – in fact, it’s pretty average code from my experience – but it does have some issues.

Your task is the following:

  1. Look at the code and figure out what issues there are. I suggest writing them down.
  2. Refactor the code into something that is testable, which is to be proven in the obvious way.

While you do this, I recommend that you try to a) do as much of it using refactoring support (either built into VS or using Resharper), and b) do as much of it with keyboard shortcuts as possible.

In each of the next few posts, I’ll pick a specific issue that I see and take the code through a transformation to make it better.

You suck at TDD #3 – On Design sensitivity and improvement

February 11, 2016 at 8:31 am

Through an above-average display of ineptness, I managed to overwrite my first version of this post with a response I wrote to the comments. Since I’m too lazy to try to reconstruct it, I’m going to go a different direction. In a recent talk about design, I told the attendees that I was going to insult them during the talk, and it worked pretty well, so I’m going to try the same thing here:

You don’t know how to do design. You would not know a good design if it came up and bit you.

To explain this, I feel it necessary to talk a bit about ignorance. I will loop back to the topic at hand sometime in the future.

In a really cool article written back in 2000, Phillip Armour wrote about 5 orders of ignorance, which I’ll summarize here:

  • Oth Order Ignorance (0OI) – Lack of ignorance, or in other words, knowledge. I have 0OI about skiing.
  • 1st Order Ignorance (1OI) – Known lack of knowledge. I have 1OI about how to knit, but I can think of a number of ways to convert this to 0OI.
  • 2nd order Ignorance (2OI) – Lack of awareness. I have 2OI when I don’t know that I don’t know something. I cannot – and this is the point – give examples here.
  • 3rd order Ignorance (3OI) – Lack of process. I have 3OI when I don’t know a suitably efficient way to find out that I don’t know that I don’t know something.
  • 4th order ignorance (4OI) – Meta ignorance. I don’t know about the 5 orders of ignorance.

 

Like any model, this is a simplification of what is really going on, but it’s a useful model.

Back to the insult, but this time I’ll reword it using the orders of ignorance:

You have 20I about design; the code that you are writing has obvious problems with it but you do not see them because you are 2OI about these problems and therefore cannot see the problems yourself.

It’s a little less catchy, and that’s why this series is named "You suck at TDD", not "Systemic influences and limitations in human knowledge acquisition have limited your effectiveness at TDD in interesting ways".

You can probably think back when you learned something about coding – for example, there was probably a time when you learned it was good not to repeat code. Before you learned that, you had 2OI ignorance about it, and now you no longer do.

We *all* have at least some 2OI about design; it’s just a feature of how knowledge acquisition works. That is not the real problem.

The real problem is that most developers have both 3OI and 4OI when it comes to design. They don’t understand how 2OI works and therefore think that they have good design skills, and – probably more importantly – they don’t have an efficient way of identifying specific instances of 2OI and converting them to 1OI and ultimately 0OI.

Or, to put it more concisely, most developers think they are good at design because they can’t see the problems in their designs, they don’t really understand that this is the case, and they don’t have any plan to improve.

If you couple this with our industry’s typical focus on features, features, features, it is no longer surprising that there is so much bad code out there; it becomes surprising that there is code that works at all.

So, tying this whole thing back to TDD, the reason that TDD does not work for a lot of people is that they don’t have sufficient design skills to see the feedback that TDD is giving to them. They get to the refactor step, shrug, and say, "looks good to me; let’s write the next test". And these 2OI issues add up, and they make it harder to do TDD on the code, and it gets hard to do.

This is not surprising. TDD is about evolutionary design, and if you aren’t very good at design, it’s not going to work very well.

Improving your design skills

The remainder of this series is about improving your design skills. I will probably focus on specific patterns that I find especially useful in the TDD world, but I’d like to provide some general advice here. These are things that will help you move from 2OI to 1OI in specific areas of design. And please, if you have other ideas, add them in the comments:

  1. Pair with somebody who cares about design. They will be 1OI or 0OI in areas where you are 2OI, and by having discussions you will learn together.
  2. Read and study. There is a *ton* of useful information out there. If you like abstract, read – and then re-read Fowler’s book on Refactoring. If you like more specificity, go read information about a single code smells, and then find it – and fix it – in your code. If you want a bigger example, just search for refactoring, and you’ll find examples like this.
  3. Share and teach. You may think that you know something about a specific area of design, but when you go to teach it to others, that is when you really learn it.
  4. Practice refactoring skills. If you are lucky enough to work in a language with good refactoring tools, train yourself to use them instead of hand-editing. Not only will this speed you up and reduce the errors you make, it will train you to think differently about code.
  5. Practice TDD and experiment with different approaches. What does "tell, don’t ask" do to your code? Can a more functional programming approach have benefits?