The Ultimate Guide to Handling CSV Files in Python

Written by Brendon
19 May 2025

Learn how to confidently read, write, and modify CSV files in Python using the built-in csv module.

CSV files in Python

Introduction: The Humble CSV #

Let’s face it - CSV files aren’t glamorous. They’re not JSON with its lovely nested structure, nor are they Excel spreadsheets with their fancy charts and rainbow-colored cells. But CSVs? They get stuff done.

Comma-Separated Values (CSV) files are the duct tape of data formats — simple, flexible, and probably holding half the internet's data infrastructure together.

CSV files are plain text, which means they’re easy to read, easy to generate, and easy to accidentally mess up in Notepad. They're commonly used to represent tabular data — think spreadsheets, contact lists, survey results, or that one export from your accounting software that makes you question your life choices.


By the way, here's the video version of this article. Perfect if you want a short, high level overview before diving into this more in depth article.


Where Do These Files Come From? #

Great question. If CSV files were people, they’d be everywhere. You’ll find them:

  • As exports from spreadsheet tools like Excel and Google Sheets.
  • Buried inside ZIP files from government open data portals.
  • Coughed up by databases during backups.
  • Delivered with love (and no documentation) via APIs.
  • Hand-crafted by that one teammate who insists on manually editing data in Notepad++.

And yes, you can absolutely create them yourself. If you’re curious, you can crack open any text editor and type:

name,age,occupation
Alice,30,Data Scientist
Bob,35,Wizard

Save that as people.csv, and boom — you’ve manually created a CSV.

Why Care About CSV Files? #

Because Python loves CSVs. It’s like peanut butter and jelly, or sysadmins and dark mode. The csv module in Python’s standard library is a powerful, flexible tool for reading and writing CSV files in a way that makes you feel like a data wizard — with just a few lines of code.

We’ll walk through the practical tools Python gives us to wrangle CSV files. You'll learn how to read them, write them, twist them into shape, and do all that without crying into your keyboard (hopefully). Along the way, we’ll cover things like:

  • The difference between csv.reader and csv.DictReader (aka Rows vs Readable Rows)
  • How to write CSVs like a pro (without clobbering your original data)
  • What to do when your CSV contains surprise!—quotes, commas, or even newlines

…and a few pro tips for dealing with giant files and weird edge cases.

Reading CSV Files in Python #

Now that we’ve established CSVs are the unsung heroes of data, let’s talk about reading them in Python. The csv module gives us two main tools to do this: csv.reader and csv.DictReader.

Think of csv.reader as the “raw but fast” option - it gives you lists of values, one list per row. csv.DictReader, on the other hand, is a little more civilized: it hands you each row as a dictionary, where column names become keys.

Method 1: csv.reader #

Let’s start with the basics. Here's how to use csv.reader:

import csv

with open('people.csv', newline='') as file:
    reader = csv.reader(file)
    for row in reader:
        print(row)

If your CSV contains contains the data provided in the earlier example, the output will look like this:

['name', 'age', 'occupation']
['Alice', '30', 'Data Scientist']
['Bob', '35', 'Wizard']

As you can see, each row is just a list of strings. That’s great if you like working with indexes (row[0], row[1], etc.), but it can feel a bit like solving a puzzle every time you forget what’s in which column.

Pro Tip: If your CSV has headers (like most do), and you want to skip the first row, you can do it like this:

next(reader)  # Skips the header

Method 2: csv.DictReader #

DictReader takes things up a notch by treating the first row as the header and giving you each subsequent row as a dictionary. Much friendlier.

import csv

with open('people.csv', newline='') as file:
    reader = csv.DictReader(file)
    for row in reader:
        print(row)

Output:

{'name': 'Alice', 'age': '30', 'occupation': 'Data Scientist'}
{'name': 'Bob', 'age': '35', 'occupation': 'Wizard'}

Now you can access row['occupation'] instead of counting on row[2] and hoping for the best. Cleaner, safer, and far less prone to causing random bugs on a Friday afternoon.

Heads Up: All values are still strings. If you need numbers, you’ll need to convert them:

int(row['age'])  # 30, not '30'

Customizing How Python Reads CSVs #

Sometimes your CSVs are… how do we put this politely? Not well-behaved.

Maybe they use semicolons instead of commas. Maybe someone wrapped every field in quotes for reasons known only to them. Fortunately, Python’s csv module can handle these quirks with optional parameters:

reader = csv.reader(file, delimiter=';', quotechar='"', escapechar='\\')

Some commonly used parameters:

  • delimiter=',' – Change it if your data is separated by something weird like ; or |
  • quotechar='"' – Useful when fields are wrapped in quotes (especially if they contain commas)
  • escapechar='\' – Helps if fields contain literal quote characters or special symbols

Bonus: When You’re Not Sure What You’re Looking At #

If you get handed a CSV file and it looks like it was produced by a caffeinated raccoon, you can let Python guess the format using csv.Sniffer:

with open('mystery.csv', newline='') as file:
    sample = file.read(1024)
    file.seek(0)

    dialect = csv.Sniffer().sniff(sample)
    reader = csv.reader(file, dialect)

    for row in reader:
        print(row)

Python will do its best to guess the delimiter and quoting style, which is both helpful and a little terrifying.

Writing CSV Files in Python #

Reading data is fun, sure - but sometimes, you need to output your own CSV file. Maybe you’ve cleaned up some messy data. Maybe you generated a report. Maybe you’re building a chatbot that rants about Python and now it needs a place to save its thoughts.

Whatever the reason, Python’s csv module has your back with two main tools: csv.writer and csv.DictWriter. One is fast and simple, the other is friendlier and knows what a dictionary is. Let’s explore both.

Method 1: csv.writer #

This is your go-to method when you’re working with lists of data and you don’t need to worry about column names.

import csv

data = [
    ['name', 'age', 'occupation'],
    ['Alice', 30, 'Data Scientist'],
    ['Bob', 35, 'Wizard']
]

with open('output.csv', 'w', newline='') as file:
    writer = csv.writer(file)
    writer.writerows(data)

Boom. That creates a nice little CSV file that looks like:

name,age,occupation
Alice,30,Data Scientist
Bob,35,Wizard

writerow() vs writerows() #

Know the difference:

  • writerow() writes a single row (a list).
  • writerows() writes multiple rows (a list of lists).
writer.writerow(['name', 'age'])         # One row
writer.writerows([['Alice', 30], ['Bob', 35]])  # Multiple rows

Method 2: csv.DictWriter #

If you’re working with dictionaries, DictWriter is your best friend. It lets you write rows with named columns, and it’s easier to follow than remembering whether data[2] was “age” or “space-time fluctuation risk.”

import csv

data = [
    {'name': 'Alice', 'age': 30, 'occupation': 'Data Scientist'},
    {'name': 'Bob', 'age': 35, 'occupation': 'Wizard'}
]

with open('output.csv', 'w', newline='') as file:
    fieldnames = ['name', 'age', 'occupation']
    writer = csv.DictWriter(file, fieldnames=fieldnames)
    
    writer.writeheader()   # Writes the column names first
    writer.writerows(data)

This gives you:

name,age,occupation
Alice,30,Data Scientist
Bob,35,Wizard

Pro Tip: If your dictionaries are missing a field, DictWriter won’t freak out, but it will leave the cell blank.

Writing to the Same File vs a New One #

You can write to the same file you read from, but caution! If you open the file in 'w' (write) mode, you’re erasing it. If you open it in 'a' (append) mode, you’re adding to the end.

  • 'w' mode: overwrite the file entirely
  • 'a' mode: add new rows at the bottom
with open('people.csv', 'a', newline='') as file:
    writer = csv.writer(file)
    writer.writerow(['Charlie', 40, 'Time Traveler'])

Just make sure you’re not duplicating headers or creating a weird CSV with inconsistent formats.

Warning: csv.writer won’t write the header again in append mode. So if your file didn’t have a header yet...it still won’t.

Modifying CSV Data #

Reading? check ✅ Writing? check ✅

Now let’s kick it up a notch and actually change what’s in those CSVs.

Whether you want to remove a column, add a new one, or append new data, Python makes it (mostly) painless.

Removing a Column From a CSV #

Let’s say someone got a little too enthusiastic with their data collection and added a column called "favourite_lunch_quote". You don't need it. It’s going.

Here’s how you surgically remove a column:

import csv

with open('input.csv', newline='') as infile, open('output.csv', 'w', newline='') as outfile:
    reader = csv.DictReader(infile)
    fieldnames = [name for name in reader.fieldnames if name != 'favourite_lunch_quote']
    writer = csv.DictWriter(outfile, fieldnames=fieldnames)
    
    writer.writeheader()
    for row in reader:
        del row['favourite_lunch_quote']
        writer.writerow(row)

Boom. Column gone. No trace. It’s like it never existed (unless someone checks Git history).

Adding a New Column to a CSV #

Need to add a new column called "status" that just says "active" for everyone? Easy peasy.

import csv

with open('input.csv', newline='') as infile, open('output.csv', 'w', newline='') as outfile:
    reader = csv.DictReader(infile)
    fieldnames = reader.fieldnames + ['status']
    writer = csv.DictWriter(outfile, fieldnames=fieldnames)
    
    writer.writeheader()
    for row in reader:
        row['status'] = 'active'
        writer.writerow(row)

You can also compute values dynamically—set "status" to "senior" if age > 30, for instance. Up to you.

Pro Tip: Read-Modify-Write Pattern #

A common pattern you’ll use for modifying CSVs is:

  1. Read the original data (csv.DictReader)
  2. Tweak each row (add/remove/transform)
  3. Write the new data to a new file (csv.DictWriter)
  4. (Optional) Replace the original file if needed

This keeps things clean and avoids accidentally corrupting your only copy of the original data.

Handling Complex CSV Data: Quotes, Commas, and Escapes #

Time to dive into the messy side of CSVs — where fields contain commas, quotes, or other mischievous characters that break your parser’s spirit unless you know what you’re doing.

CSV files are deceptively simple—until someone decides their favorite city is New York, NY or they write job titles like “Senior Developer, Backend & Infrastructure”. Suddenly your nice, clean parsing job turns into a cryptic mess.

Don’t worry. Python's csv module has a few tricks up its sleeve to keep things civilized.

Problem: Commas Inside Fields #

Let’s say you’re reading a file that looks like this:

name,location
Alice,"New York, NY"
Bob,"Paris, France"

Without quotes, "New York, NY" would be interpreted as two separate columns. Not what you want.

Thankfully, csv.reader and csv.writer handle quoted strings just fine—as long as your CSV is well-formed.

import csv

with open('people.csv', newline='') as file:
    reader = csv.reader(file)
    for row in reader:
        print(row)

Output

['name', 'location']
['Alice', 'New York, NY']
['Bob', 'Paris, France']

The quoting Parameter #

Want to take control of quoting behavior when writing CSVs? Use the quoting parameter from the csv module. Here are your options:

  • csv.QUOTE_MINIMAL – Only quote fields when necessary (default)
  • csv.QUOTE_ALL – Quote everything like a Victorian novelist
  • csv.QUOTE_NONNUMERIC – Quote all non-numeric fields
  • csv.QUOTE_NONE – Quote nothing (bring your own escape character)

Output

"name","age","city"
"Alice","30","New York, NY"

Sometimes a little extra quoting makes life easier when sharing files between systems — or colleagues who live in spreadsheet chaos.

Escaping Characters With escapechar #

If you go rogue and disable quotes with quoting=csv.QUOTE_NONE, you better provide an escapechar. This tells the writer how to escape delimiters (like commas) inside your data.

writer = csv.writer(file, quoting=csv.QUOTE_NONE, escapechar='\\')

Now when you have a row like this:

['Alice', 'New York, NY']

Will be output like this:

Alice,New York\, NY

Is it pretty? No. Does it work? Absolutely.

Special Case: Double Quotes Inside Fields #

If your data has quotes inside a string, CSV handles this by doubling them:

data = [['quote', 'She said "hello!"']]

Will become:

quote,"She said ""hello!"""

Python handles this automatically.

Summary: When to Use What #

Here's some general rule of thumb:

Situation Recommended Setting
Fields may contain commas Use quoting=csv.QUOTE_MINIMAL or QUOTE_ALL
Fields contain quotes Let the writer handle it (doubles them)
Want more control over escaping Set escapechar='\' and quoting=QUOTE_NONE
Sharing files with Excel users quoting=QUOTE_ALL, dialect='excel'

Dialects in CSV #

If you've ever opened a CSV file and seen it almost work — except the delimiter is a semicolon, or the line endings are weird — you've encountered a dialect mismatch.

A CSV dialect is like a regional accent for your data. It's how the file speaks its structure — things like what separates columns, how it handles quotes, line breaks, and other subtle quirks.

Built-in Dialects: Excel and Excel-Tab #

Python comes with a couple of predefined dialects for common spreadsheet styles:

import csv

# Reading a file in Excel dialect (default)
with open('data.csv', newline='') as file:
    reader = csv.reader(file, dialect='excel')

The 'excel' dialect handles:

  • Commas as delimiters
  • Double quotes for quoting
  • CRLF (\r\n) line endings

There’s also 'excel-tab' for tab-separated values (TSV):

reader = csv.reader(file, dialect='excel-tab')

As the name suggests, the delimiter in this case is a tab.

Creating a Custom Dialect #

Sometimes you're handed a CSV that just… does its own thing. Maybe it uses | as a delimiter. Maybe it quotes fields with ~ for some reason. This is where custom dialects shine.

Define your own dialect like so:

csv.register_dialect(
    'pipey',
    delimiter='|',
    quotechar='~',
    quoting=csv.QUOTE_ALL,
    escapechar='\\'
)

Now use it like so:

reader = csv.reader(open('weird_file.csv'), dialect='pipey')

This lets you name and reuse a consistent format across multiple reads/writes.

Why Use a Dialect Instead of Just Passing Options? #

You can pass parameters like delimiter and quotechar directly each time:

reader = csv.reader(file, delimiter='|', quotechar='~')

But if you're dealing with multiple files using the same quirky style—or just want cleaner, DRY-er code—a dialect keeps things tidy.

Plus, once registered, dialects work for both reader and writer.

Listing and Unregistering Dialects #

You can see all registered dialects with:

csv.list_dialects()
# ['excel', 'excel-tab', 'pipey']

To unregister a custom one:

csv.unregister_dialect('pipey')

Goodbye, pipey. You were weird but memorable.

Handling Large CSV Files in Python #

Sometimes you'll come across massive CSV files with millions of rows, hundreds of megabytes, and the looming threat of your machine's fans spinning into orbit.

For cases like this, Python gives you tools to read and write CSVs line-by-line — streaming data of any size instead of trying to load everything into RAM.

Stream the File #

Python’s csv.reader and csv.DictReader already do you a favor: they’re iterators. That means they don’t load the whole file into memory at once.

Here’s how to process a giant CSV one row at a time:

import csv

with open('bigfile.csv', newline='') as file:
    reader = csv.reader(file)
    for row in reader:
        process(row)  # Your logic here

Even if your CSV is 2GB, this will happily churn through it row by row without gobbling up memory.

Use Generator Functions for Clean Code #

Want to keep things neat? Wrap your CSV logic in a generator function:

def read_large_csv(path):
    with open(path, newline='') as file:
        reader = csv.DictReader(file)
        for row in reader:
            yield row

Now you can use it like this:

for row in read_large_csv('bigfile.csv'):
    print(row['username'])

It’s memory-friendly and keeps your code modular.

Writing Large Files in Chunks #

Just like reading, writing large CSVs can be done one row at a time — no need to build up a massive list first.

with open('output.csv', 'w', newline='') as file:
    writer = csv.writer(file)
    writer.writerow(['name', 'score'])

    for user in millions_of_users():
        writer.writerow([user.name, user.score])

Bonus: it starts writing immediately, so your script feels faster.

Pandas and 3rd Party Libraries #

You can go really far using Python's built in csv module. Much further than you might expect.

In those rare cases where you might have requirements over and above reading and writing CVS's such as complex data wrangling once the data is loaded, there are 3rd party libraries like Pandas which might be worth investigating.

Wrapping Up #

Whether you're cleaning up messy exports, generating reports, or just automating the world's most boring Excel task — you’re now equipped to wrangle any CSV file that lands on your task board.