Word Embeddings

How do you explain to a computer that "King" is similar to "Queen"? With Word Embeddings, we represent words as dense vectors (lists of numbers) in a high-dimensional space.

1. One-Hot Encoding vs. Embeddings 🆚

One-Hot Encoding:
- Cat = [1, 0, 0, 0]
- Dog = [0, 1, 0, 0]
- Problem: No relationship between vectors. They are orthogonal. Huge memory usage.
Embeddings:
- Cat = [0.2, 0.9, -0.1]
- Dog = [0.2, 0.8, -0.2]
- Result: Similar words have similar vectors!

2. Word2Vec 🧠

Created by Google in 2013. It learns word associations from a large corpus of text. Key idea: "You shall know a word by the company it keeps."

CBOW (Continuous Bag of Words): Predict target word from context.
Skip-Gram: Predict context words from target word.

3. Vector Arithmetic ➕

The most famous example of Word2Vec magic:

King - Man + Woman = Queen

This shows that the model captured the concept of "Gender" and "Royalty" as directions in the vector space.

4. Cosine Similarity 📐

To measure how similar two words are, we calculate the cosine of the angle between their vectors.

1.0: Identical direction (Synonyms).
0.0: Unrelated (Orthogonal).
-1.0: Opposite direction (Antonyms).

Interactive Challenge: Vector Similarity

Let's simulate vector similarity with NumPy.

PYTHON PLAYGROUND

⏳ Loading editor…

Quiz

Question 1 of 3

What is the main advantage of Embeddings over One-Hot Encoding?

They are easier to calculate

They capture semantic meaning and relationships

They use more memory

Key Takeaways

✅ Embeddings are dense vector representations of words.
✅ Word2Vec learns these from context.
✅ Vector Arithmetic allows us to manipulate concepts mathematically.

What's Next?

Word2Vec is great, but it has a flaw: "Bank" has the same vector in "River Bank" and "Bank Account". We need a model that understands Context. Enter the Transformer.

Next Chapter: Transformers & Attention.