Exploratory Data Analysis (EDA)
The detective work of AI. Learn how to summarize, group, and find correlations in your data.
The detective work of AI. Learn how to summarize, group, and find correlations in your data. This hands-on tutorial focuses on practical implementation of exploratory data analysis (eda) concepts.
Exploratory Data Analysis (EDA)
Before you train a model, you must understand your data. EDA is the process of investigating the dataset to discover patterns, spot anomalies, and check assumptions.
1. Descriptive Statistics π
Get a high-level overview of your data.
import pandas as pd
import seaborn as sns
df = sns.load_dataset('titanic')
# Summary of numeric columns
print(df.describe())
# Count unique values in a categorical column
print(df['class'].value_counts())
# Third 491
# First 216
# Second 184
2. GroupBy and Aggregation π§±
Split your data into groups and apply a function (mean, sum, count).
# Average fare by Class
print(df.groupby('class')['fare'].mean())
# First 84.15
# Second 20.66
# Third 13.67
# Survival rate by Sex
print(df.groupby('sex')['survived'].mean())
# female 0.74
# male 0.18
3. Correlation Analysis π
Correlation measures how two variables move together.
- +1: Perfect positive correlation (Height goes up, Weight goes up).
- -1: Perfect negative correlation (Speed goes up, Travel time goes down).
- 0: No correlation.
# Calculate correlation matrix
corr = df.corr()
# Visualize with a heatmap
sns.heatmap(corr, annot=True, cmap='coolwarm')
4. Pivot Tables π
Summarize data across two dimensions (like Excel Pivot Tables).
# Survival rate by Class AND Sex
pivot = df.pivot_table(index='class', columns='sex', values='survived')
print(pivot)
# sex female male
# class
# First 0.968085 0.368852
# Second 0.921053 0.157407
# Third 0.500000 0.135447
Interactive Challenge: Titanic Analyst
Analyze the Titanic dataset to find out who survived.
Quiz
Quiz
Question 1 of 3What does df.groupby('col').mean() do?
Key Takeaways
β
EDA is about asking questions of your data.
β
GroupBy lets you compare different segments.
β
Correlation helps you find relationships between features.
What's Next?
You have mastered the Python tools for AI. Now, you are ready for the real deal.
Next Module: Module 3 β Core Machine Learning.