Clean, elegant design or an over abstracted, confusing mess? DRY code can be all or none of these things. Like most topics, some nuance in the discussion is probably needed.
Subscribe to receive mission updates and backend development insights.
DRY or Do Not Repeat Yourself in software development is a practice which aims to limit code duplication. At first glance this might sound like a great idea with benefits like code reusability, maintainability and developer efficiency being touted from the roof tops.
DRY allows us to go from this:
import math
def rectangle_area(width, height):
if width <= 0 or height <= 0:
raise ValueError("Width and height must be positive numbers.")
return width * height
def triangle_area(base, height):
if base <= 0 or height <= 0:
raise ValueError("Base and height must be positive numbers.")
return 0.5 * base * height
def circle_area(radius):
if radius <= 0:
raise ValueError("Radius must be a positive number.")
return math.pi * radius * radius
To this:
import math
def validate_positive(*values):
for value in values:
if value <= 0:
raise ValueError("All dimensions must be positive numbers.")
def rectangle_area(width, height):
validate_positive(width, height)
return width * height
def triangle_area(base, height):
validate_positive(base, height)
return 0.5 * base * height
def circle_area(radius):
validate_positive(radius)
return math.pi * radius * radius
Sure the above is a simple, somewhat contrived example but it never the less perfectly illustrates the core concept of DRY. Which is to refactor duplicated code out into a single reusable piece of logic.
But does DRY code really deliver on everything it promises? Or does it result in unmaintainable spaghetti code? Well,...maybe the truth lies somewhere in the middle.
So lets start with some of the benefits and go from there. Imagine we're building a reporting module into our product which spits out a few aggregated stats for products and users:
def generate_user_report(users):
active_users = [user for user in users if user["is_active"]]
inactive_users = [user for user in users if not user["is_active"]]
return {
"total_users": len(users),
"active_users": len(active_users),
"inactive_users": len(inactive_users),
}
def generate_product_report(products):
active_products = [product for product in products if product["is_active"]]
inactive_products = [product for product in products if not product["is_active"]]
return {
"total_products": len(products),
"active_products": len(active_products),
"inactive_products": len(inactive_products),
}
The above code is easy to understand but clearly there's some duplication and we can't have that in the world of DRY code. So let's refactor it (and while we're at i'll make the code a little cleaner and more performant):
def generate_summary_report(items, item_type):
num_items = len(items)
data = {"total_items": num_items}
data["active_items"] = [item for item in items if item["is_active"] is True]
data["inactive_items"] = num_items - len(data["active_items"])
return data
def generate_user_report(users):
return generate_summary_report(users, "users")
def generate_product_report(products):
return generate_summary_report(products, "products")
Nice. Now we have a single reusable implementation for generating a summary report against any collection.
For us software engineers, there are few things better than a single source of truth. Knowing that there's a single function somewhere which defines the one true implementation of something is comforting to say the least.
No need to re-implement any complicated logic. Simply call our single source of truth function, class, component or whatever from anywhere in the code base and move on with our day.
Knowing that we have a single implementation for generating a summary report means if we ever need to change logic, add something new or fix a bug related to those reports we only need to do it it one place.
As you continue to implement reusable DRY functions your code becomes increasingly modular. By this I mean you end up with building blocks which you can plug and play wherever you need them.
This idea of modularity or code reuse becomes even more prevalent if you've refactored your duplicate code out into pure functions.
A pure function is a function which will always produce the same output if given the same input and does not produce any side effects.
When you've refactored common logic out into reusable functions, your development speed increases every time you need to you need to use that function.
Let's say tomorrow your boss asks you for a new summary report for equipment. Instead of spending time figuring out how to implement the report logic, you simply use your existing function:
def generate_equipment_report(equipment):
return generate_summary_report(equipment, "equipment")
BAM, you're done.
So with all the benefits of DRY code, it's clearly the way to go right? Well, let's see about that...
Our boss has informed us that we now need to include aggregation stats for in stock and out of stock products and equipment within our summary report.
Keep in mind those metrics are not applicable to users so we'll need to account for that. So we head off to our single DRY summary report function to make the updates:
def generate_summary_report(items, item_type):
num_items = len(items)
data = {"total_items": num_items}
data["active_items"] = [item for item in items if item["is_active"] is True]
data["inactive_items"] = num_items - len(data["active_items"])
try:
data["in_stock_items"] = [item for item in items if item["in_stock"] is True]
data["out_of_stock_items"] = num_items - len(data["in_stock_items"])
except KeyError:
pass
return data
Sorted. Now if we pass in a collection of items which each have an in_stock attribute, we'll aggregate that as needed. Our boss is going to be thrilled.
So the first drawback to DRY code that I'll touch on is overgeneralization. What ends up happening is what we've got in our new generate_summary_report function above.
We abstracted our report generation logic early, maybe too early. Now we've got new requirements coming in which have forced us to account for edge cases and exceptions. This is a slippery slope which can result in complicated functions containing excessive conditionals.
Overgeneralization often stems from:
When abstracting too early, we find ourselves constantly on the lookout for code that could potentially be reused at some point in the future.
Then with over or prematurely optimizing we fall into a similar trap. We become so scared of duplicate code that part of our brain is always looking for ways to DRY up our code regardless of whether it truly makes sense to do so yet.
We end up with functions filled with conditionals and optional parameters a mile long.
While I outlined the benefit of modular design above, a counter point which can arise is that logic from various places throughout your code base becomes tightly coupled. This seems counter intuitive, so let's take another look at the new generate_summary_report function again.
We're now sitting with 3 entity types (users, products and equipment) all relying on the same summary report generation logic. Changes to that single function has far reaching implications. As our reporting needs grow, so too will our need to always understand all the places which might be impacted by future changes.
We're no longer able to make a change to a single report without taking into account how that might affect all other reports.
Abstracted logic can often be harder to understand and debug.
Highly dried out code can also add unnecessary mental overhead. By this I mean, every time you touch a function which you know is used in several places, you need to load that context into your brain. You're no longer simply changing a function. You're changing a dozen distinct pieces of functionality indirectly.
The mental overhead is especially noticeable when onboarding new engineers who won't have all the necessary context built up over time.
So after all that, where do you stand? Personally I find value in both and try not to be too dogmatic about it.
But to be honest, the longer I've been a software engineer, the more I lean towards simply writing code in the exact place I need it until it eventually becomes crystal clear that abstracting and reusing it finally makes sense.
So with that, here's a few rules of thumb:
That's a wrap. Go forth and write semi DRY code.