Premature Abstraction: When Clean Code Goes Wrong

What are abstractions in software development? #

Let's kick things off with a formal definition and then move swiftly along to something more practical and less academic.

From Wikipedia:

In software engineering and computer science, abstraction is the process of generalizing concrete details, such as attributes, away from the study of objects and systems to focus attention on details of greater importance.

Hmmm, ok. Let me take a stab at explaining that in plain English.

Think of a car. When you drive, you don't need to know how the engine works - you just use the steering wheel, pedals, and gear shift. That's abstraction in action. The complicated inner workings are "abstracted away" behind a simple interface.

In software, abstraction works the same way. Instead of dealing with complex details, we create simpler ways to work with our code. For example:

Instead of writing code to handle every detail of saving data to a database, we might just call saveUser(name, email)
Rather than managing memory locations directly, we use variables like customerName
Instead of writing network code, we might just call fetchData()

Abstraction lets us hide complexity behind simple interfaces, making our code easier to write, understand, and maintain.

So with all that said, it should be fairly clear what abstractions are and also how useful they can be. Before we get to the concept of premature abstraction and why it's something to be wary of, let's get into a practical code example.

Practical Example of Abstraction #

Here we've got a SpaceShuttle class which allows us to calculate fuel usage over a distance:

class SpaceShuttle:
    def __init__(self, name, fuel_capacity):
        self.name = name
        self.fuel_capacity = fuel_capacity
        self.current_fuel = fuel_capacity

    def calculate_fuel_usage(self, distance):
        return distance * 0.5  # 0.5 units of fuel per light-year

As we continue developing our Space game, we find ourselves needing to implement different types of spacecrafts each of which having their own fuel usage calculation.

Here's where abstraction can come in handy:

from abc import ABC, abstractmethod

class SpacecraftBase(ABC):
    def __init__(self, name, fuel_capacity):
        self.name = name
        self.fuel_capacity = fuel_capacity
        self.current_fuel = fuel_capacity

    @abstractmethod
    def calculate_fuel_usage(self, distance):
        """Abstract method to be implemented by specific spacecraft types"""
        pass

    def can_complete_mission(self, distance):
        """Shared behavior for all spacecraft types"""
        return self.current_fuel >= self.calculate_fuel_usage(distance)

class SpaceShuttle(SpacecraftBase):
    def calculate_fuel_usage(self, distance):
        return distance * 0.5

class InterstellarCruiser(SpacecraftBase):
    def calculate_fuel_usage(self, distance):
        return distance * 0.3

Now we have a single clean way to create multiple spacecrafts. Notice how some functions can be standardized across all spacecrafts while others are abstract.

Need an InterstellarCruiser? Bam! we've got one in 3 lines of code. Need a SpaceShuttle? Same deal, Bam! 3 lines of code.

This is far better than the alternative which would have been to duplicate the original class and then modify part of it every time we need a new type of spacecraft.

Premature Abstraction - Almost Never a Good Idea #

Ok so now we understand what abstraction is in software engineering and we've seen an example where using it makes sense.

While we can agree abstraction is often a good idea, it's almost never a good idea to start looking for abstraction opportunities too early. What's too early you might ask? Well we'll get there later on. For now, I'd like to dig into a few issue that will likely arise with premature abstraction.

Planning for a Future Which May Never Happen #

I want to start with this one because in my personal opinion it's the most important.

When we're working on a new project, or even a new feature within an existing project it's natural to see opportunities for abstraction all over the place.

We're building out a navigation system for our spacecrafts and can't help but think we might someday need a navigation system for the autonomous robots in the game, so we start abstracting the spacecraft navigation system for that possible future requirement.

Will we ever need a navigation system for the autonomous robots in the game? Who knows. Maybe. Maybe not. But what we do know is that we don't need one now. So we should avoid adding complexity, indirection and reduced flexibility into our code base for something which might never be needed.

Complexity in Disguise #

Code abstraction adds layers of indirection resulting in additional cognitive load for developers trying to navigate a code base.

Understanding and having a mental model of all the moving parts of a code base that relies heavily on abstractions is not as easy as one which has more duplication. Duplication is not always the enemy (see DRY Code Best Practices for more on this).

Reduced Flexibility #

By implementing abstractions early on, you're locking yourself into those abstractions without fully understanding the problem domain.

Artificial constraints are applied early on and future changes become more difficult.

Testing Complexity #

More elaborate testing scenarios are required to cover all scenarios and use cases. This is excessive and slows productivity early on.

The Specific vs. The General #

Early on you're solving specific problems and implementing specific solutions. This is not the time to be thinking about implementing general solutions for potential abstract use cases in the future.

This ties in with the concept discussed above on "Planning for a Future Which May Never Happen".

Guidelines for Responsible Abstraction #

How do we know when it's time to implement an abstraction? When is an abstraction not premature?

The Rule of Three #

The first time you implement a solution, ignore that part of your brain screaming "This code would be perfect to use again in so many different ways. Let me create an abstraction for it". Just implement the solution. Done. Move on.

The second time you implement something similar again, just let it be. Yes there's some duplication. Yes you can see a pattern forming. Just implement the solution (again). Done. Move on.

If a third time rolls around, ok now it might be time to look into implementing an abstraction.

Keep Things Simple and Procedural at First #

While as Software Engineers it's always tempting to want to create elegant solutions and beautiful code, it's often best to simply stick to procedural code at first.

Write the code where you need the code. Sure the code might be verbose and complex, but at least it's all in one place and can be understood by reading through it linearly. You haven't introduced multiple levels of indirection, making navigating multiple files a requirement to grok the code.

As your understanding of the problem domain grows and as general cases naturally surface, you can begin refactoring and implementing abstractions.

Solving Real Problems #

Wait until you get to the point where an abstraction solves a real problem, not a hypothetical one.

When your code base, solution and understanding of the problem domain have naturally surfaced repeated problems which will benefit from abstractions, you'll know it's time. At this point you're solving a real problem.

Focus on Data Structures Before Abstractions #

In the early stages of implementing a solution, it's important to focus on understanding the problem domain and the data structures that can be used to model that domain.

From Rich Hickey, the creator of Clojure:

Focus on constructing the core data structure first. Get that right, and the code that manipulates it will fall into place.

And that's a wrap. Abstractions in software development are powerful, but please make sure you truly need them first. They add layers of indirection and reduce flexibility in your code base. With great power comes great responsibility.

Premature Abstraction in Software: More Harm than Good