Simedius

Sunday, June 19, 2022

Background

One common technique to apply simulations is called Monte Carlo simulation. It's similar to the simulation from the previous example. The main idea is to use random sampling to solve deterministic problems.

Tasks

Research what these simulations are. Give examples. Implement at least one case of a Monte Carlo simulation. You can use the following checklist to help with your research and work:

What is a simulation?
- How is simulation used in science?
- Why is a simulation useful?
How are statistics useful in simulation? How can we simulate unknown, random processes?
What is a Monte Carlo simulation (also known as "Monte Carlo method")?
A common use of Monte Carlo methods is numeric integration
- Define the problem. Propose the solution. Implement it and test with some common functions
- How does this method compare to other methods, e.g. the trapezoidal rule? Compare the performance (accuracy and time to execute) of both methods
Apply Monte Carlo simulation to a real-life system. There are many examples. You can see Wikipedia or some other resource for inspiration.

1. What is a simulation

Simulation studies are a general term for any sort of analysis which involves repeatedly performing the same analysis on data which is in some fashion randomly generated, and aggregating the results. Simulation are very useful when we are lacking real-world data to test some hypothesis.

1.1. How is simulation used in science ?

A simulation study is useful if theoretical arguments are insufficient to determine whether the method of interest is valid in a specific real-life application or whether violations of the assumptions underlying the available theory (such as normal distribution of residuals, proportional hazards and so on) affect the validity of the results. In methodological research, simulations play a role similar to experiments in basic science. A major advantage in simulation studies is the ability to have prior knowledge of the outcome and the behavior of some statistical methods, as the true data used to generate the simulation is already available.

1.2. Why is a simulation useful ?

Simulation studies are also helpful to provide objective reproducible answers to more general methodological questions on the behaviour of statistical methods. Given some prior knowledge, simulation allow us to approximate the outcome of a unknown function of interest. After comparing the actual results with the expected, a posteriori, it is possible to update our initial beliefs about the process.

In addition to the evaluation of individual methods, simulations can also be used to determine which one of several candidate methods will perform best for the application at hand.

The great advantage of the statistical simulation method is the possibility to include into a simulation process some real components of the system, especially those which cannot be described mathematically, for Example, a man or a group of men who take part in the system functioning.

2a. How are statistics useful in simulation ?

Exploring a real-world phenomenon, we establish a hypothesis about its nature and try to invalidate following the Claud Bernard method's of falsification. After achieving a good understanding, we can use statistical methods to generate a simulation environment, which help us better represent the uncertainty of the rpocess generating our data. Often we assume that our observation are in fact realizations of some random variable or better drawn from the joint distributions of many random variables, thus accounting for their combined effect. In a sumulation setting, we are able to apply the same principles by generating random samples from pre-defined distribution with a given set of parameters which may or may not satisfy our needs. The important thing is that we already know the initial parameters and we are even able to fine-tune them in order to see how our new hypothesis yet to be tested with the simulation is responding.

2b. How can we simulate unknown, random processes ?

Rnadom processes are the result of the combined effect of independant and indentically distributed random variables and observations of a random process are drawn from their joint distribution. According to Wikipedia, Random or stochastic processes are widely used as mathematical models of systems and phenomena that appear to vary in a random manner. In python we can implement that by drawing samples from the normal distribution with $\mu = 0$ and a constant variance. The three main characteristics of a truly random process (called also a white noise) are:

Its mean is zero
Its Standard deviation is constant over time
The correlation between present and past observations is not significant

Let's generate a random process in python. Let's also add a statistical test used often in practise for white noise process - the Ljung–Box test, which posits a null hypothesis $H_0$ that the process is a white noise by assessing the the autocorrelation of its observations.

@interact(mu=(0, 10), sigma=(1, 20), size=(100, 1000))
def random_process(mu=0, sigma=1, size=500):
    white_noise = np.random.normal(mu, sigma, size=size)
    plt.plot(white_noise)

    pvalue = acorr_ljungbox(white_noise, lags=[40], return_df=True)["lb_pvalue"][40]
    print("P-value of the Ljung-Box test: ", pvalue)

    plt.title("Random process")
    plt.show()

3. What is a Monte Carlo simulation ?

Monte Carlo methods are a collection of computational techniques for finding an approximate solution of mathematical problems, which make fundamental use of random samples. It consists in a multiple random sampling from a given set of probability distributions. Monte Carlo methods invert the usual problem of statistics: rather than estimating random quantities in a deterministic manner, random quantities are employed to provide estimates of deterministic quantities.

Common step in order to construct a Monte Carlo experiment are:

Define a domain of possible inputs. This simulated “universe” should be similar to the universe whose behavior we wish to describe and investigate.
Generate inputs randomly from a probability distribution over the domain. These inputs should be generated so that their characteristics are similar to the real universe we are trying to simulate (in particular, dependencies between the inputs should be represented).
Perform a deterministic computation on the inputs.
Aggregate the results to obtain the output of interest (typically histograms, summary statistics, confidence intervals).

4. Numeric integration by the Monte Carlo method

According to Wikipedia, Numerical integration comprises a broad family of algorithms for calculating the numerical value of a definite integral. One the methos used for a numerical integration is the trapezoid rule. Here is a simple illustration for the definite integral:

$$\int_{a}^{b} f(x), dx \approx \frac{1}{2} \sum_{k=1}^{N} \left( x_{k} - x_{k-1} \right) \left( f(x_{k}) + f(x_{k-1}) \right)$$

In such situations, stochastic simulation (“Monte Carlo”) methods allow us to generate an approximation of the integral, simply by evaluating the expression a large number of times at randomly selected points in the input space and counting the proportion that are less than the integrand at that point. The larger the number of simulations we run, the better the approximation.

5. Applications

5.1. Estimate the value of $\pi$

Consider the largest circle which can be fit in the square ranging on $\mathbb{R} ^2$ over $[1, -1] ^2$. The circle has a radius $r = 1$, and area $\pi$. The square has an area of $2^2 = 2$. The ratio between the two areas is thus $\frac{\pi}{4}$.

The steps of a Monte Carlo simulation include:

draw the square over $[-1,1]^2[−1,1]^2$ then draw the largest circle that fits inside the square
randomly scatter a large number NN of grains of rice over the square
count how many grains fell inside the circle
the count divided by NN and multiplied by 4 is an approximation of $\pi$

We can simulate this procedure in numpy by drawing random numbers from a uniform distribution between -1 and 1 to represent the $x$ and $y$ positions of our grains of rice, and checking whether the point is within the circle using Pythagoras’ theorem. This procedure is an adaptation of what’s called Buffon’s needle problem, after the 18th century French mathematician the Count of Button. It belongs to a topic called geometric probability.

Mathematics in Machine Learning