You suck at TDD #4 – External dependencies

January 26, 2016 at 1:48 pm

When I started doing TDD, I thought it was pretty clear what to do with external dependencies. If your code writes to a file system – for example – you just write a file system layer (what would typically be called a façade, though I didn’t know the name of the pattern back then), and then you can mock at that layer, and write your tests.

This is a very common approach, and it mostly works in some scenarios, and because of that I see a lot of groups stick at that level. But it has a significant problem, and that problem is that it is lacking an important abstraction. This lack of abstraction usually shows up in two very specific ways:

  • The leakage of complexity from the dependency into the application code
  • The leakage of implementation-specific details into the application code

Teams usually don’t notice the downside of these, unless a very specific thing happens: they get asked to change the underlying technology. Their system was storing documents in the file system, and it now needs to store them in the cloud. They look at their code, and they realize that the whole structure of their application is coupled to the specific implementation. The team hems and haws, and then comes up with a 3 month estimate to do the conversion. This generally isn’t a big problem for the team because it is accepted pretty widely that changing an underlying technology is going to be a big deal and expensive. You will even find people who say that you can’t avoid it – that it is always expensive to make such a change.

If the team never ends up with this requirement, they typically won’t see the coupling nor will the see the downside of the leakage. In my earlier posts I talked about not being sensitive to certain problems, and this is a great example of that. Their lives will be much harder, but they won’t really notice.

Enter the hexagon

A long time ago in internet time, Alistair Cockburn came up with a different approach that avoids these problems, which he called the Hexagonal Architecture. The basic idea is that you segment your application into two different kinds of code – there is the application code, and then there is the code that deals with the external dependencies.

About this time, some of you are thinking, “this is obvious – everybody knows that you write a database layer when you need to talk to a database”. I’ll ask you to bear with me for a bit and keep in mind the part where if you are not sensitive to a specific problem, you don’t even know the problem exists.

What is different about this approach – what Cockburn’s big insight is – is that the interface between the application and the dependency (what he calls a “port”) should be defined by the application using application-level abstractions. This is sometimes expressed as “write the interface that you wish you had”. If you think of this in the abstract, the ideal would be to write all of the application code using the abstraction, and then go off an implement the concrete implementation that actually talks to the dependency.

What does this give us? Well, it gives us a couple of things. First of all, it typically gives us a significant simplification of the interface between the application and the dependency; if you are storing documents, you typically end up with operations like “store document, load document, and get the list of documents”, and they have very simple parameter lists. That is quite a bit simpler than a file system, and an order of magnitude simpler than most databases. This makes writing the application-level code simpler, with all of the benefits that come with simpler code.

Second, it decouples the application code from the implementation; because we defined the interface at the application level, if we did it right there are no implementation-specific details at the app layer (okay, there is probably a factory somewhere with some details – root directory, connection string, that sort of thing). That gives us the things we like from a componentization perspective, and incidentally makes it straightforward to write a different implementation of the interface in some other technology.

At this point there is somebody holding up their hand and saying, “but how are you going to test the implementation of port to make sure it works?” BTW, Cockburn calls the implementation of a port an “adapter” because it adapts the application view to the underlying dependency view, and the overall pattern is therefore known as “port/adapter”.

This is a real concern. Cockburn came up with the pattern before TDD was really big so we didn’t think about testing in the same way, and he was happy with the tradeoff of a well-defined adapter that didn’t change very often and therefore didn’t need a lot of ongoing testing because the benefits of putting the “yucky dependency code” (my term, not his) in a separate place was so significant. But it is fair to point to that adapter code and say, “how do you know that the adapter code works?”

In the TDD world, we would like to do better. My first attempt did what I thought was the logical thing to do. I had an adapter that sat on top of the file system, so I put a façade on the file system, and wrote a bunch of adapter tests with a mocked-out file system, and verified that the adapter behaved as I expected it to. Which worked because the file system was practical to mock, but would not have worked with a database system because of the problem with mocking.

Then I read something that Arlo wrote about simulators, and it all made sense.

After I have created a port abstraction, I need some way of testing code that uses a specific port, which means some sort of test double. Instead of using a mocking library – which you already know that I don’t like – I can write a special kind of test double known as a simulator. A simulator is simply an in-memory implementation of the port, and it’s generally fairly quick to create because it doesn’t do a ton of things. Since I’m using TDD to write it, I will end up with both the simulator and a set of tests that verify that the simulator behaves properly. But these tests aren’t really simulator tests, they are port contract tests.

So, I can point them at other implementations of the port (ie the ones that use the real file system or the real database), and verify that the other adapters behave exactly the way the simulator does. And that removes the requirement to test the other adapters in the traditional unit-tested way; all I care about is that all the adapters behave the same way. And it actually gives me a stronger sense of correctness, because when I used the façade I had no assurance that the file system façade behaved the same way the real file system did.

In other words, the combination of the simulator + tests has given me a easy & quick way to write application tests, and it has given me a way to test the yucky adapter code. And it’s all unicorns and rainbows from then on. Because the simulator is a real adapter, it supports other uses; you can build a headless test version of the application that doesn’t need the real dependency to work. Or you can make some small changes to the simulator and use it as an in-memory cache that sits on to of the real adapter.

Using Port/Adapter/Simulator

If you want to use this pattern – and I highly recommend it – I have a few thoughts on how to make it work well.

The most common problem people run into is in the port definition; they end up with a port that is more complex than it needs to be or they expose implementation-specific details through the port.

The simplest way to get around this is to write from the inside out. Write the application code and the simulator & tests first, and then only go and write the other adapters when that is done. This makes it much easier to define an implementation-free port, and that will make your life easier far easier.

If you are refactoring into P/A/S, then the best approach gets a little more complex. You probably have application code that has implementation-specific details. I recommend that you approach it in small chunks, with a flow like this:

  1. Create an empty IDocumentStore port, an empty DocumentStoreSimulator class, and an empty DocumentStoreFileSystem class.
  2. Find an abstraction that would be useful to the application – something like “load a document”.
  3. Refactor the application code so that there is a static method that knows how to drive the current dependency to load a document.
  4. Move the static method into the file system adapter.
  5. Refactor it to an instance method.
  6. Add the method to IDocumentStore.
  7. Refactor the method so that the implementation-dependent details are hidden in the adapter.
  8. Write a simulator test for the method.
  9. Implement the method in the simulator.
  10. Repeat steps 2-9.

Practice

I wrote a few blog posts that talk about port/adapter/simulator and a practice kata. I highly recommend doing the kata to practice the pattern before you try it with live code; it is far easier to wrap your head around it in a constrained situation than in your actual product code.

Lean, Toyota, and how it relates to agile.

January 4, 2016 at 5:22 pm

I like to read non-software-development books from time to time, and during my holiday vacation, I read “The Toyota Way to Continuous Improvement: Linking Strategy and Operational Excellence to Achieve Superior Performance”.

What? Doesn’t everybody relax by reading books about different organizational approaches and business transformation during their holidays?

I highly recommend the book if you are interested in agile processes in the abstract; there’s an interesting perspective that I haven’t been seeing in the part of the agile world I’ve been paying attention to. 

I will note that there are a number of people who have explored and written about the overlap between lean and agile, so I don’t think I’m breaking ground here, but there are a few things that I think are worth sharing.

Value stream mapping

Part of my definition of “being agile” involves change; if you aren’t evolving your process on an ongoing basis, you aren’t agile. I’ve spent a lot of time looking at processes and can now look at a team’s process and have a pretty good idea where they are wasting time and what a better world might look like.

Unfortunately, that isn’t especially useful, because “Do what Eric says you should do” is not a particularly good approach. I don’t scale well, and I have been wrong on occasion.

Shocking, I know.

It also removes the tuning to a specific team’s needs, which is pretty important.

I do know how to teach the “determine an experiment to try during retrospective” approach, and have had decent luck with that, but teams tend to go for low-hanging fruit and tend to ignore big opportunities. Just to pick an example, you can experiment your way into a better process for a lot of things, but the project build that takes 20 minutes and the code review process that takes 8 hours is now dominating your inner loop, and those are the things that you should think about.

I don’t currently have a great way to make these things obvious to the team, a way to teach them how to see the things that I’m seeing. I’m also missing a good way to think about the process holistically, so that the big issues will at least be obvious.

Enter value stream mapping, which is a process diagram for whatever the team is doing. It includes the inputs to the team, the outputs from the team, and all of the individual steps that are taken to produce the output. It also typically includes the amount of time each operation takes, how long items sit in a queue between steps, whether there are rework steps, etc. Here’s a simple diagram from Net Objectives:

The times in the boxes are the average times for each step, and I think the times underneath the boxes are the worst cases. The times between are the average queue times. We also show some of the rework that we are spending time (wasting time) on.

Given all of this data, we can walk all the boxes, and figure out that our average time to implement (ignoring rework) is about 266 hours, or over 6 weeks. Worse, our queue time is just killing us; the average queue time is 1280 hours, or a full 8 *months*. So, on average, we can expect that a new request takes over 9 months to be deployed. We can then look at what the world would be like if we combined steps, reduced queue sizes, or reduced rework. It gives us a framework in which we can discuss process.

This is a simple example; I’m sure that the real-world flow also has a “bugfix” path, and there is likely a “high-priority request” section that is also at work.

I’m also interested in the details inside the boxes. We could decompose the Code box into separate steps:

  1. Get current source and build
  2. Write code
  3. Test code
  4. Submit code for code review
  5. Handle code review comments
  6. Submit code to gated checkin system

Each of these steps has queue times between them, there are likely rework loops for some of them, and the “test” step likely varies significantly based on what you are testing.

Coming up with a value stream mapping is typically the first thing you do with the Toyota approach. I like the fact that it’s a holistic process; you cover all the inputs and the outputs rather than focusing on the things you know about.

I have not tried doing this for a software team yet, but I find the approach very promising and hope to try it soon. I’m especially hoping that it will highlight the impact that big organizational systems have on team agility.

Implementing an improvement

The continuous improvement approach used by Toyota is know as PDCA (either plan-do-check-act or plan-do-check-adjust). Here’s a more detailed explanation of “plan”:

  1. Define the problem or objective
  2. Establish targets
  3. Understand the physics (use 5 whys to figure out what is really going on).
  4. Brainstorm alternatives
  5. Analyze and rank alternatives
  6. Evaluate impact on all areas (this is a “what could go wrong?” step)
  7. Create a visual, detailed, double-ended schedule

I like that it’s organized into steps and overall focuses on the “let’s think about this a bit”. That is good, and I especially like #3 and #6.

On the other hand, agile teams who aren’t used to making changes can easily get stuck in analysis paralysis and can’t actually agree on something to try, and a detailed set of steps could easily make that worse. I’m more concerned that they try *something* at the beginning rather than it be the absolute best thing to try.

So, I’m not sure about this one yet, but it is interesting.

Organic vs Mechanistic

In the book, the talk about two ways of implementing lean. The organic approach is close to the Toyota approach; you send a very experienced coach (sensei) into the group, and they teach the group how to do continuous improvement and stick around for a while. This gives great results, but requires a lot of work to teach the team members and reinforcement to make sure the whole team understands that continuous improvement is their job. It is also sensitive to the environment the group is embedded inside; some groups had made the transition but it didn’t take because management didn’t understand the environment it took to foster the improvement in the first place.

I’m sure that seems familiar to many of you trying to do agile.

The mechanistic approach comes from the Six Sigma folks. It focuses more on top-down implementation; you train up a centralized group of people and they go out across the company to hold Kaizen (improvement) events. That gives you breadth and consistency across the organization – which are good things – but the results aren’t as significant and – more importantly – teams do not keep improving on their own.

As you might have figured out, I’m a big fan of the organic approach, as I see it as the only way to get the ongoing continuous improvement that will take you someplace really great – the only way that you will get a radically more productive team. And I’ve seen a lot of “scrum from above” implementations, and – at best – the have not been significant successes. So, I’m biased.

Interestingly, the book has a case study of a company that owned two shipyards. One took an organic approach and the other took the mechanistic approach. The organic approach worked great where it was tried, but it was difficult to spread across that shipyard without support and it was very easy for the group to lose the environment that they needed to do continuous improvement.

The mechanistic shipyard had not seen the sort of improvements that the organic one saw, but because they had an establish program with executive sponsorship, the improvements were spread more broadly in the enterprise and stuck around a bit better.

The consultants said that after 5 years is was not clear which shipyard had benefitted more. Which I find to be very interesting in how you can do something organic but it’s really dependent on the individuals, and to make something lasting you need the support of the larger organization.

The role of employees

In the Toyota world, everybody works on continuous improvement, and there is an expectation that every employee should be able to provide an answer around what the current issues are in their group and how that employee is helping make things better.

That is something that is really missing in the software world, and I’m curious what sort of improvements you would see if everybody knew it was their job to make things better on an ongoing basis.

The role of management

One of the interesting questions about agile transitions is how the role of management evolves, and there are a lot of different approaches taken. We see everything from “business as usual” to people managers with lots of reports (say, 20-50) to approaches that don’t have management in the traditional sense.

I’m a big believer in collaborative self-organizing approaches, and the best that I’m usually hoping for is what I’d label was “benign neglect”. Since that is rare, I hadn’t spent much time thinking about what optimal management might be.

I think I may now have a partial answer to this. Toyota lives and breathes continuous improvement, and one of the most important skills in management is the ability to teach their system at that level. They have employees whose only role is to help groups with continuous improvement. I think the agile analog is roughly, “what would it be like if your management was made up of skilled agile coaches who mostly focused on helping you be better?”

Sounds like a very interesting world to be in – though I’m not sure it’s practical for most software companies.  I do think that having management focusing on the continuous improvement part – if done well – could be a significant value-add for a team.