Should your tests focus on how your system works or what it delivers? Testing isn't just about writing code to catch bugs.
Subscribe to receive mission updates and backend development insights.
Before we get stuck in, let's make sure we're all on the same page with a few definitions.
The difference is so subtle you might miss it if you sneeze. But it's there, and it matters.
The difference is that with behavior focused testing, we're testing the edges or boundaries of a system. What happens inside those boundaries is not our concern. There's a behavioral promise in place between our system and the users of that system. Our tests should ensure that promise is being fulfilled.
Geez, get meta much? Ok, moving on.
Some people tend to argue (or politely suggest) that all we're really talking about here is unit vs integration tests. Well, that's not really true and here's why:
def delete_all_items(items):
items.clear()
from unittest.mock import MagicMock
def test_delete_all_items_implementation():
items = MagicMock()
delete_all_items(items)
items.clear.assert_called_once()
def test_delete_all_items_behavior():
items = [1, 2, 3, 4]
delete_all_items(items)
assert len(items) == 0
Clearly these are not integration tests, yet we can still define one as being implementation focused while the other is behavior focused.
The first test is brittle and wont survive refactoring. The moment we change how delete_all_items works, the test breaks regardless of whether or not the function still accomplishes the same task.
The behavior focused test on the other hand is focused on the end result. What is delete_all_items actually trying to accomplish? Well, in the end we want items to be empty and we don't really care how that happens.
Now we're getting to the heart of the matter, we want our code to perform a certain task and all we really care about is that the task is completed successfully, regardless of how it happens.
One of the main problems with implementation focused tests is that they're brittle and seldom survive code refactoring.
So I already see a few hands going up to say: But that's a good thing, I want my tests to fail if any part of my code changes. How else will I know if something is broken?
To answer that question, let's jump into a slightly more complex example. Here we're coding away in our job at a AAA game studio, developing the space ship registry system for our latest game.
from dataclasses import dataclass
from typing import List, Optional
from enum import Enum
class ShipClass(Enum):
FIGHTER = "fighter"
CRUISER = "cruiser"
BATTLESHIP = "battleship"
@dataclass
class SpaceShip:
registry_id: str
ship_class: ShipClass
class InMemoryShipRegistry:
"""
Manages ship registration for the United Space Federation.
Ships must be registered before they can dock at space stations
or engage in legal trade routes.
"""
def __init__(self):
self._registered_ships: List[SpaceShip] = []
def register_ship(self, ship: SpaceShip) -> None:
"""Register a new ship in the federation database."""
self._registered_ships.append(ship)
def get_ship(self, registry_id: str) -> Optional[SpaceShip]:
"""Lookup a ship by its unique registry identifier."""
return next(
(ship for ship in self._registered_ships if ship.registry_id == registry_id),
None
)
def get_ships_by_class(self, ship_class: ShipClass) -> List[SpaceShip]:
"""Get all ships of a specific class."""
return [
ship for ship in self._registered_ships
if ship.ship_class == ship_class
]
Ok so we've got our ship registry. Let's write a couple implementation focused tests:
def test_registered_ship_exists_in_registry():
# Arrange
ship = SpaceShip(registry_id="USS-FALCON-9", ship_class=ShipClass.FIGHTER)
registry = InMemoryShipRegistry()
# Act
registry.register_ship(ship)
# Assert
assert ship in registry._registered_ships
assert len(registry._registered_ships) == 1
def test_get_ships_by_class(self):
# Arrange
registry = InMemoryShipRegistry()
fighter = SpaceShip("USS-FALCON-9", ShipClass.FIGHTER)
cruiser = SpaceShip("USS-VICTORY", ShipClass.CRUISER)
registry._registered_ships = [fighter, cruiser]
# Act
fighters = registry.get_ships_by_class(ShipClass.FIGHTER)
# Assert
assert len(fighters) == 1
assert fighters[0] is fighter
Wow those tests kind of hurt the eyes, but let's go with it. As you can see, in both cases we're coupling our test code tightly to the implementation of the system under test.
If anything changes in the underlying mechanics of the InMemoryShipRegistry, it's likely our tests are going to break even if the functionality remains unchanged. But more on that later.
Now let's write a couple behavior focused tests:
def test_registered_ship_exists_in_registry(self):
# Arrange
ship = SpaceShip("USS-FALCON-9", ShipClass.FIGHTER)
registry = InMemoryShipRegistry()
# Act
registry.register_ship(ship)
found_ship = registry.get_ship(ship.registry_id)
# Assert
assert found_ship == ship
def test_get_ships_by_class(self):
# Arrange
registry = InMemoryShipRegistry()
fighter1 = SpaceShip("USS-FALCON-9", ShipClass.FIGHTER)
fighter2 = SpaceShip("USS-XWING-RED5", ShipClass.FIGHTER)
cruiser = SpaceShip("USS-VICTORY", ShipClass.CRUISER)
# Act
registry.register_ship(fighter1)
registry.register_ship(fighter2)
registry.register_ship(cruiser)
fighters = registry.get_ships_by_class(ShipClass.FIGHTER)
# Assert
assert len(fighters) == 2
assert all(ship.ship_class == ShipClass.FIGHTER for ship in fighters)
Ok so now we're testing the boundaries of the system under test. We're testing the behavior.
First we make sure that a ship can be registered by using the API's which the InMemoryShipRegistry gives to us. We don't care how a ship is registered, we just want to know that it gets done.
Next we make sure that we can get all ships by class, again using the API's which the InMemoryShipRegistry provides.
But maybe you're not sold yet. Maybe we need to go a step further and do some refactoring of our InMemoryShipRegistry to see how that affects our tests.
In our new registry, we're going to use a dictionary for registered ships. We're also going to introduce a dictionary to manage ships by class. Here we go:
class InMemoryShipRegistry:
def __init__(self):
self._ships_by_registry = {}
self._ships_by_class = {
ship_class: [] for ship_class in ShipClass
}
def register_ship(self, ship: SpaceShip) -> None:
self._ships_by_registry[ship.registry_id] = ship
self._ships_by_class[ship.ship_class].append(ship)
def get_ship(self, registry_id: str) -> Optional[SpaceShip]:
return self._ships_by_registry.get(registry_id, None)
def get_ships_by_class(self, ship_class: ShipClass) -> List[SpaceShip]:
return self._ships_by_class[ship_class].copy()
All good so far. We now have a shiny new underlying implementation of our InMemoryShipRegistry and have ensured that our public facing API's continue to work.
Here's where the rubber meets the road. Let's run our existing implementation focused tests:
Found 2 test(s).
FF
Ran 2 tests in 0.01s
FAILED (failures=2)
Darn. Looks like we're going to have to spend time fixing our broken tests because they both relied heavily on the internal implementation.
Next let's see what our existing behavior focused tests have to say:
Found 2 test(s).
..
Ran 2 tests in 0.01s
PASSED
Awesome, all green. Our expected behavior, the promise that exists between our system under test and the user, is still in tact. Let's move on with our day.
Let's rapid fire some of the benefits of a more behavior focused approach to testing:
In a nutshell, we spend less time baby sitting tests.
Our tests will survive through implementation changes and performance optimizations. We can go from having a space station's docking system which uses a simple list to a complex queuing system without breaking tests.
Documenting an active, ever evolving code base isn't always easy. In spite of best efforts, doc strings, wiki's and the like aren't always 100% in sync with the code they're documenting.
Behavior focused tests on the other hand provide documentation automatically.
Here's looking at you 100% coverage gang. Let's focus on quality over quantity.
I think these speak for themselves.
Based on everything I've presented so far, you might think I'm advocating for completely eliminating implementation focused tests. But in reality, there is room (and a need) for both styles.
A few scenarios that come to mind where implementation focused tests are appropriate could be:
Back to our space game example, we might have an algorithm which determines weapon damage. Our algorithm takes into account armor penetration physics, critical hit probabilities, damage falloff over distance etc. These calculations should likely have fine grained implementation focused tests.
I find that focusing completely on one or the other style of test in all situations ends up in pain, one way or the other.
My preference is to default to behavior focused testing the vast majority of time. When exceptions to that rule arise, it's clear that I'm dealing with a small, tightly constrained piece of code where the implementation really matters and should therefore be tested.