3.2.2.1. CSV#
CSV stands for Comma-Separated Values and is one of the simplest and most widely used formats for storing tabular data. In a CSV file:
Each row represents a record.
Columns (features) are separated by commas (
",").The first row often contains a header that defines the column names.
CSV files are human-readable and supported by nearly all data tools and spreadsheet software, making them a common choice for data exchange and storage.
Reading a CSV File into a DataFrame#
We can use the pandas.read_csv() method to load a CSV file into a DataFrame:
import pandas as pd
# Read CSV into a DataFrame
df = pd.read_csv("example.csv")
df.head()
| Name | Age | Department | Salary | |
|---|---|---|---|---|
| 0 | Alice | 30 | Engineering | 85000 |
| 1 | Bob | 25 | Marketing | 62000 |
| 2 | Charlie | 28 | Sales | 70000 |
| 3 | Diana | 35 | Engineering | 92000 |
| 4 | Ethan | 40 | HR | 78000 |
Writing a DataFrame to a CSV File#
You can save a DataFrame back to a CSV file using the to_csv() method:
# Save DataFrame to CSV
df.to_csv("new_file.csv", index=False)
# Read it back in to verify
df2 = pd.read_csv("new_file.csv")
df2.head()
| Name | Age | Department | Salary | |
|---|---|---|---|---|
| 0 | Alice | 30 | Engineering | 85000 |
| 1 | Bob | 25 | Marketing | 62000 |
| 2 | Charlie | 28 | Sales | 70000 |
| 3 | Diana | 35 | Engineering | 92000 |
| 4 | Ethan | 40 | HR | 78000 |
Setting index=False ensures the row index is not written as an extra column in the CSV file.