Recurrent Neural Networks (RNNs)

Standard Neural Networks assume inputs are independent. But in a sentence, the word "bank" means something different in "river bank" vs "bank account". RNNs have "memory". They process data in a sequence.

1. The Loop 🔁

An RNN processes inputs one by one. It maintains a Hidden State (memory) that gets updated at each step.

h_t = Activation(W \times x_t + U \times h_{t-1})

x_t: Current input (e.g., current word).
h_{t-1}: Previous hidden state (context from previous words).

2. The Problem: Vanishing Gradients 📉

RNNs have a short memory. If a sentence is very long, they forget the beginning by the time they reach the end. This is mathematically caused by the Vanishing Gradient Problem during backpropagation.

3. The Solution: LSTM & GRU 🧠

Long Short-Term Memory (LSTM) networks were designed to fix this. They have "gates" that decide what to keep and what to forget.

Forget Gate: "Should I throw away this old info?"
Input Gate: "Is this new info important?"
Output Gate: "What should I tell the next cell?"

GRU (Gated Recurrent Unit) is a simplified, faster version of LSTM.

Interactive Challenge: Sequence Prediction

Let's predict the next number in a simple sequence using a "mental" RNN. Sequence: [2, 4, 8, 16, ...]

PYTHON PLAYGROUND

⏳ Loading editor…

Quiz

Question 1 of 3

What makes RNNs different from standard Feed-Forward networks?

They are faster

They have a loop/memory to handle sequences

They only work on images

Key Takeaways

✅ RNNs process sequences (Time Series, Text, Audio).
✅ LSTMs fix the memory problem of basic RNNs.
✅ Transformers (next module) have largely replaced RNNs for NLP.

What's Next?

We've covered the "Old Guard" of Deep Learning. Now, let's look at the architecture that changed everything: The Transformer.

Next Module: Module 5 — NLP & LLMs.