Random Module & Simulations
Learn to generate random numbers, model probability distributions, and run Monte Carlo simulations in NumPy.
Learn to generate random numbers, model probability distributions, and run Monte Carlo simulations in NumPy. This hands-on tutorial focuses on practical implementation of random module & simulations concepts.
Module 8: Random Module & Simulations
Randomness is vital in Data Science for sampling, shuffling data, and simulating real-world processes. NumPy's random sub-module is significantly more powerful and faster than Python's built-in random module.
Lesson 17: Random Number Generation
NumPy can generate entire arrays of random numbers at once.
Common Functions:
np.random.rand(d0, d1, ...): Random floats between 0 and 1 from a Uniform Distribution.np.random.randn(d0, d1, ...): Random floats from a Standard Normal Distribution (mean=0, variance=1).np.random.randint(low, high, size): Random integers.np.random.choice(array, size): Picks random elements from an existing array.
Reproducibility with seed()
Computers generate "pseudo-random" numbers. By setting a Seed, you ensure you get the same random numbers every time you run your code. This is critical for scientific reproducibility.
Lesson 18: Probability Distributions
Data Science often requires data that follows specific statistical rules.
Popular Distributions:
- Normal (Gaussian):
np.random.normal(loc, scale, size) - Binomial:
np.random.binomial(n, p, size) - Uniform:
np.random.uniform(low, high, size) - Poisson:
np.random.poisson(lam, size)
Project: Monte Carlo Simulation
Monte Carlo simulations use repeated random sampling to obtain numerical results.
Challenge: Imagine you want to estimate the probability of rolling a sum of 7 with two six-sided dice.
- Generate two arrays of 10,000 random integers (1 to 6).
- Add them together.
- Use boolean indexing to count how many sums equal 7.
- Divide by 10,000 to get the probability.
Quiz
Question 1 of 5Why is it important to set np.random.seed()?
Key Takeaways
✅ Use np.random for fast multi-dimensional random generation.
✅ Always set a seed when sharing your scientific code or tests.
✅ NumPy provides built-in functions for almost every statistical distribution.