Mastering Pandas DataFrames
The ultimate tool for data manipulation. Learn Series, DataFrames, loading data, and advanced selection.
The ultimate tool for data manipulation. Learn Series, DataFrames, loading data, and advanced selection. This hands-on tutorial focuses on practical implementation of mastering pandas dataframes concepts.
Mastering Pandas DataFrames
Pandas is the "Excel of Python". It is built on top of NumPy but adds labels, mixed data types, and powerful data manipulation tools.
1. Series vs. DataFrames π
- Series: A 1D labeled array (like a single column in Excel).
- DataFrame: A 2D labeled data structure (like a whole Excel sheet).
import pandas as pd
# Series
s = pd.Series([10, 20, 30], index=['a', 'b', 'c'])
print(s['a']) # 10
# DataFrame
data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'Paris', 'London']
}
df = pd.DataFrame(data)
2. Loading Data π₯
Pandas can read almost any format.
# CSV
df = pd.read_csv('data.csv')
# JSON
df = pd.read_json('data.json')
# Excel (requires openpyxl)
# df = pd.read_excel('data.xlsx')
# SQL
# df = pd.read_sql(query, connection)
3. Inspecting Data π
Before you analyze, you must look at your data.
df.head(n): First n rows.df.tail(n): Last n rows.df.info(): Data types and missing values.df.describe(): Summary statistics (mean, min, max).df.shape: (Rows, Columns).
4. Selecting and Filtering π―
Selecting Columns
# Single column (returns Series)
ages = df['Age']
# Multiple columns (returns DataFrame)
subset = df[['Name', 'City']]
Filtering Rows (Boolean Indexing)
# People older than 25
adults = df[df['Age'] > 25]
# People in Paris AND older than 25
paris_adults = df[(df['City'] == 'Paris') & (df['Age'] > 25)]
5. Advanced Indexing: loc vs iloc π
This is a common interview question!
loc: Label-based indexing. (Select by row/column names)iloc: Integer-based indexing. (Select by row/column positions)
# loc (Label)
# Row with index 0, Column 'Name'
print(df.loc[0, 'Name'])
# iloc (Integer Position)
# Row 0, Column 0
print(df.iloc[0, 0])
# Slicing
print(df.iloc[0:2, :]) # First 2 rows, all columns
Interactive Challenge: DataFrame Manipulation
Quiz
Quiz
Question 1 of 3What is the difference between a Series and a DataFrame?
Key Takeaways
β
DataFrame is the core object for data analysis.
β
Filtering allows you to slice data based on conditions.
β
loc/iloc give you precise control over selecting data.
What's Next?
Now that we can load and select data, we need to fix it. Real data is messy!
Next Chapter: Data Cleaning & Preprocessing.