Python

NumPy for Machine Learning

Learn how to prepare raw data for AI models using normalization, standardization, and one-hot encoding with NumPy.

By TechCoder TeamLast updated: 2026-06-02
In a Nutshell

Learn how to prepare raw data for AI models using normalization, standardization, and one-hot encoding with NumPy. This hands-on tutorial focuses on practical implementation of numpy for machine learning concepts.

Module 11: NumPy for Machine Learning

Before you feed data into a Machine Learning model, it must be preprocessed. Raw data is often in different scales, contains missing values, or uses categorical labels that machines don't understand.


Lesson 23: Data Preprocessing

Normalization (Min-Max Scaling)

Scaling data to a range between 0 and 1. This prevents features with large ranges (like Salary) from overpowering small ranges (like Age). Formula: (x - min) / (max - min)

Standardization (Z-score Scaling)

Rescaling data to have a mean of 0 and a standard deviation of 1. Formula: (x - mean) / std

PYTHON PLAYGROUND
⏳ Loading editor…

Lesson 24: Feature Engineering

Handling Missing Values

In ML, we often replace NaN (Not a Number) with the mean or median of the column.

  • np.isnan(arr): Detects NaN.
  • arr[np.isnan(arr)] = mean: Fills NaNs.

One-Hot Encoding (NumPy Way)

Converting categorical labels (like "Red", "Blue") into numerical bits.

  • np.eye(categories)[label_indices]: A fast way to generate one-hot vectors.
PYTHON PLAYGROUND
⏳ Loading editor…

Practice: Preprocessing Pipeline

Challenge: Create an array of 20 random integers representing house prices.

  1. Reshape it into a 2D column vector (20 rows, 1 column).
  2. Apply Standardization using NumPy.
  3. Replace any value that is more than 2 standard deviations away from the mean (outliers) with the mean value.

Quiz

Question 1 of 5

What is the result of (data - min) / (max - min)?

Standardization
Normalization (Min-Max Scaling)
Categorization
Vectorization

Key Takeaways

Normalization and Standardization ensure all features are treated equally by the model.
✅ Use np.nanmean to handle datasets with missing values.
np.eye is a clever trick for fast one-hot encoding.