Sept. 21, 2024

Estimating Pi with Scala: A Simple Introduction to Monte Carlo Simulation

Introduction to Scala and Apache Spark

Scala is a powerful programming language that blends object-oriented and functional programming paradigms. It is highly scalable (hence its name) and runs on the JVM, making it compatible with Java. This makes Scala a popular choice for big data processing, especially with Apache Spark—a distributed computing system built for processing large datasets in parallel across multiple machines.

Apache Spark, known for its speed and ease of use, is built with Scala, making the two an ideal pair for big data tasks. Whether you're crunching numbers on large datasets or simulating real-world events, Spark with Scala offers a powerful toolset to tackle these challenges.

In this blog post, I’ll walk through a simple Scala program I wrote to estimate the value of Pi using a Monte Carlo simulation—a technique widely used in statistics, mathematics, and physics.

Understanding Monte Carlo Simulations

Monte Carlo methods rely on random sampling to obtain numerical results. The idea is simple: if we can simulate random events that mimic the real process we’re studying, we can estimate complex quantities with relatively simple math.

One classic example is estimating Pi using the Monte Carlo method. Imagine a square with a circle inscribed within it. By randomly generating points within the square and checking how many fall inside the circle, we can estimate the ratio of the area of the circle to the area of the square—and from this, Pi can be estimated.

The Scala Code

Let's break down the Scala program that estimates Pi using the Monte Carlo method.

import scala.util.Random

object PiEstimator {
def main(args: Array[String]): Unit = {
// Number of points to simulate
val totalPoints = 10000000

// Function to check if a point (x, y) lies inside the circle
def isInCircle(x: Double, y: Double): Boolean = {
(x * x + y * y) <= 1
}

// Generate random points and count how many are inside the circle
val pointsInsideCircle = (1 to totalPoints).map { _ =>
val x = Random.nextDouble() * 2 - 1 // Random x between -1 and 1
val y = Random.nextDouble() * 2 - 1 // Random y between -1 and 1
if (isInCircle(x, y)) 1 else 0
}.sum

// Estimate Pi using the ratio of points inside the circle
val piEstimate = (4.0 * pointsInsideCircle) / totalPoints
println(f"Estimated Pi: $piEstimate%.5f")
}
}

How It Works

  • Step 1: Defining the Total Points
    The variable totalPoints defines how many random points we’ll generate. For this example, we use 10 million points, which ensures a more accurate estimate of Pi.
  • Step 2: Checking if Points Are Inside the Circle
    The function isInCircle checks whether a point (x, y) lies inside a unit circle. This is done by ensuring the distance from the origin to the point is less than or equal to 1. Essentially, if x2+y2≤1x^2 + y^2 \leq 1x2+y2≤1, the point is inside the circle.
  • Step 3: Generating Random Points
    We generate random points where the x and y coordinates are between -1 and 1, corresponding to the bounds of the square. We count how many of these points lie inside the circle by applying the isInCircle function.
  • Step 4: Estimating Pi
    The ratio of the number of points inside the circle to the total number of points is approximately equal to π/4\pi / 4π/4. Thus, multiplying this ratio by 4 gives us our estimate for Pi.

Output

When you run this program, you get an output like:

Estimated Pi: 3.14163

While not exact, this estimate is very close to the true value of Pi (3.14159). By increasing the number of points, the estimate becomes even more accurate, but at the cost of additional computation.

Conclusion

This assignment is a great way to dive into Scala’s functional programming capabilities and understand Monte Carlo simulations. By simply generating random points and counting how many fall inside a circle, we can estimate Pi with surprising accuracy.

Moreover, Scala's compatibility with Apache Spark makes it an ideal language for running simulations like this one on massive datasets. In future posts, I’ll explore how we can parallelize this computation using Apache Spark, taking advantage of distributed computing to handle even larger datasets and more complex simulations.