{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "### Programme inspecteur modèles BNP Paribas - module Monte-Carlo methods - 12 décembre 2019" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# TP 1\n", "\n", "## Warm-up: Simulation of random variables with Python and fundamental limit theorems\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise 1. Simulation of a probability density with the rejection method\n", "\n", "We consider the probability distribution with support $[-2, 2]$ and density function\n", "$$\n", "f(x) = \\frac 1{2\\pi} \\sqrt{4 - x^2} 1_{x \\in [-2,2]} ,\n", "$$\n", "usually called the Wigner law.\n", "\n", "$\\blacktriangleright$ Implement a simulation method allowing to draw i.i.d. samples from the Wigner law. Simulate a large number of samples, plot the histogram and compare with the density $f$.\n", "\n", "_Reminder on the rejection method_: in order to simulate a random variable with density $f$ of support $[x_0, x_1]$ and such that $0\\leq f\\leq M$, one can use a sequence of independent couples of random variables $(Z_k, U_k)_{k\\geq 1}$ such that, for every $k$, $Z_k$ and $U_k$ are independent and uniformly distributed respectively on $[x_0, x_1]$ and $[0, 1]$.\n", "\n", "If we set\n", "\n", "$$ \\nu = \\min\\{k\\geq 1 : \\, f(Z_k) > M U_k \\}, $$\n", "\n", "the random variable $Z_\\nu$ is distributed according to the density $f$." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def wigner_density(x):\n", " return np.sqrt(4 - x*x) / (2*np.pi)\n", "\n", "N = int(1.e4)\n", "\n", "########################################\n", "# To Do: complete with an upper bound M\n", "# for the density f\n", "########################################\n", "upper_bound = 0\n", "\n", "X = np.array([])\n", "\n", "for _ in range(N):\n", " # There is no \"while ... do\" in Python\n", " # A possible solution is given below\n", " \n", " ########################################\n", " # To Do: sample the variables\n", " # z according to uniform distribution over [-2,2]\n", " # u according to uniform distribution over [0,1]\n", " ########################################\n", " z = ???\n", " u = ??? \n", " \n", " v = upper_bound * u\n", " \n", " while v >= wigner_density(z):\n", " #################################\n", " # To Do: complete the code below\n", " #################################\n", " z = ???\n", " v = ???\n", "\n", " X = np.append(X, z)\n", "\n", "##############################\n", "# Plot the histogram\n", "##############################\n", "n_columns = 2 * int(N**(1/3))\n", "\n", "plt.hist( ??? , density=True, bins=n_columns, label=\"Empirical law\")\n", " \n", "########################################\n", "# Plot the true density for comparison\n", "########################################\n", "x = np.linspace(-2., 2., 100)\n", "\n", "f_x = ???\n", "\n", "plt.plot( ???, ???, \"r\", label=\"Density of the Wigner law\", linewidth=1.5)\n", "\n", "plt.legend(loc='best')\n", "plt.title(\"Wigner law, N = %1.0e samples with rejection method\" %N)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise 2. Law of large numbers.\n", "\n", "Let $(X_i)_{i\\ge1}$ be a sequence of i.i.d. random variables such that $\\mathbb{E}\\bigl[ \\bigl|X_1\\bigr| \\bigr]<\\infty$.\n", "We remind that the law of large numbers states that the sequence of _empirical means_\n", "$$\n", "\\hat m_n = \\frac1n \\sum_{i=1}^n X_i\n", "$$\n", "converges almost surely to $\\mathbb{E}[X_1]$ as $n\\to \\infty$." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 1.1 Convergence of the sequence $\\hat m_n$ as $n$ becomes large\n", "\n", "Draw $n$ independent random samples uniformly distributed $[0,1]$ and plot the sequence $n \\mapsto \\hat m_n$ for $n$ ranging from $1$ to $n$.\n", "\n", "The following functions from `numpy` can be used: `numpy.random.rand` to draw the $n$ samples from the uniform distribution on $[0,1]$, and the function `numpy.cumsum` to compute the cumulated sum up to $n$." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "N = 1000 #sample size\n", "\n", "########\n", "# Complete with N draws from the uniform distribution on [0,1]\n", "#\n", "X = ???\n", "#\n", "########\n", "\n", "integers1toN = np.arange(1,N+1) # A numpy array containing the integers from 1 to N\n", "\n", "############\n", "# Stock in the variable 'empMean' the sequence of empirical means for n ranging\n", "# from 1 to N\n", "#\n", "empMean = ???\n", "#\n", "############\n", "\n", "############\n", "# Plotting\n", "############\n", "plt.plot(??? , ???, color=\"b\", label=\"Empirical mean\")\n", "\n", "# A horizontal line for the reference theoretical value\n", "plt.axhline(0.5, color=\"r\", label=\"True expectation\")\n", "\n", "plt.legend(loc=\"best\")\n", "\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 1.2 Another example\n", "\n", "The Cauchy distribution with parameter $a > 0$ is the probability distribution on $\\mathbb R$ with density function\n", "$$\n", "f(x) = \\frac a{\\pi(a^2+x^2)}.\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### $\\blacktriangleright$ Question (a):\n", "\n", "$F(x)=\\int_{-\\infty}^x f(y) dy $ is the cumulative distribution function (cdf) of the Cauchy distribution.\n", "Let $U$ be a r.v. uniformly distributed on $[0,1]$. \n", "\n", "What is the law of $F^{-1}(U)$?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### $\\blacktriangleright$ Question (b):\n", "\n", "Noticing that $F(x) = \\frac1{\\pi}\\arctan(\\frac xa)+\\frac12$, generate $N$ i.i.d. samples distributed according to the Cauchy distribution wit parameter $a=1$, still using the function `numpy.random.rand`.\n", "\n", "Plot the sequence of the empirical means $\\hat m_n$ for $n$ ranging from $1$ to $N$.\n", "\n", "What do you observe?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "N = 1000\n", "U = np.random.rand(N)\n", "\n", "a = 1.\n", "\n", "########\n", "# Stock in the variable X the N draws from the Cauchy distribution with parameter a,\n", "# exploiting the inverse of the cdf \n", "#\n", "X = ????\n", "#\n", "########\n", "\n", "integers1toN = np.arange(1,N+1)\n", "\n", "########\n", "# Stock in the variable 'empMean' the sequence of empirical means for n ranging\n", "# from 1 to N\n", "# \n", "empMean = ???\n", "########\n", "\n", "# Plot the sequence of the empirical means\n", "#\n", "plt.plot(...)\n", "#\n", "########" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise 3. Central limit theorem and confidence intervals.\n", "\n", "Let $(X_i)_{i\\ge1}$ be a sequence of i.i.d. random variables with $\\mathbb E[X_1^2]<\\infty$, and denote $\\hat m_n = \\frac1n \\sum_{i=1}^n X_i$ the sequence of their empirical means.\n", "\n", "The central limit theorem states that the sequence of _renormalized errors_\n", "\n", "$$\n", "e_n = \\frac{\\sqrt n}{\\sqrt{Var(X_1)}} \\bigl(\\hat m_n - \\mathbb E[X_1] \\bigr)\n", "$$\n", "\n", "converges in law towards a Gaussian distribution $\\mathcal{N}(0,1)$ as $n \\to \\infty$." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 2.1 Convergence in law of the error" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The goal of this first part is to display the closeness of the distribution of the renormalized error $e_n$ with the Gaussian distribution, when $n$ is sufficiently large.\n", " \n", "We consider a sequene of i.i.d. random variables with exponential distribution pf parameter $\\lambda = 2$.\n", "\n", "For fixed $n$, draw a sample of $M$ values $ \\bigl(\\hat m_n^j \\bigr)_{j=1,\\dots,M}$ of the empirical mean.\n", "\n", "Plot the histogram of the corresponding sample of values of the renormalized error\n", "\n", "$$\n", "e^j_n = \\sqrt{\\frac n{Var(X_1)}} \\bigl(\\hat m^j_n - \\mathbb E[X_1] \\bigr),\n", "\\qquad j = 1, \\dots, M\n", "$$\n", "\n", "and compare with the standard Gaussian distribution." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "n = 3000 # n draws of the exponential distribution to compute the empirical mean\n", "M = 5000 # number of samples of the empirical mean\n", "\n", "####################################################\n", "# n*M independent draws of the exponential distribution\n", "# with parameter lambda=2\n", "#\n", "lambd = 2\n", "\n", "# Pay attention to the arguments of the function random.exponential \n", "X = np.random.exponential(1/lambd, (M, n))\n", "####################################################\n", "\n", "# True known values of expectation and variance (as a benchmark)\n", "esp = 1/lambd\n", "var = (1/lambd)**2\n", "\n", "####################################################\n", "# Stock in the variable 'empMean_n' the sample of M independent \n", "# values of the empirical mean \\hat m_n \n", "#\n", "empMean = ???\n", "#\n", "# Stock in the variable 'renormError_n' the sample of M independent\n", "# values of the renormalized error \n", "#\n", "renormError_n = ????\n", "####################################################\n", "\n", "\n", "####################################################\n", "# Plotting\n", "plt.hist( ??? , density=\"True\", bins=int(np.sqrt(M)), label=\"erreur normalisee\")\n", "\n", "####################################################\n", "# Standard gaussian density for comparison\n", "x = np.linspace(-5, 5, 100)\n", "\n", "gaussianDensity = ????\n", "\n", "####################################################\n", "# Plotting\n", "plt.plot(x, gaussianDensity, color=\"red\", label=\"densite gaussienne\", linewidth=2.0)\n", "\n", "plt.legend(loc=\"best\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 2.2 Confidence intervals\n", "\n", "Contrary to the previous example, in practice one does not know the value $Var(X_1)$ (which cannot be used, then, to build the confidence intervals).\n", "\n", "Yet, it is possible to estimate $Var(X_1)$ with the same random sample used to estimate $\\mathbb E[X]$, using the empirical variance as an estimator:\n", "\n", "$$\n", "\\sigma_n^2 = \\frac1n \\sum_{i=1}^n X_i^2 - (\\hat m_n)^2\n", "$$." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### $\\blacktriangleright$ Question (a):\n", "\n", "Which result allows to state that the random sequence\n", "$$\n", "\\frac{\\sqrt n}{\\sigma_n} \\left(\\hat m_n - \\mathbb E[X_1]\\right)\n", "$$\n", "converges in law towards $\\mathcal{N}(0,1)$, when $n \\to \\infty$?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### $\\blacktriangleright$ Question (b):\n", "\n", "Defining for $\\delta > 0$ the (random) interval\n", "$$\n", "I_n^{\\delta}=\\Bigl[\\overline{X}_n-\\delta\\frac{\\sigma_n}{\\sqrt n},\\ \\overline{X}_n+\\delta\\frac{\\sigma_n}{\\sqrt n}\\Bigr],\n", "$$\n", "the previous question entails that\n", "\n", "$$\n", "\\mathbb P\\Bigl(\\mathbb E[X_1] \\in I_n^{\\delta}\\Bigr)\n", "=\n", "\\mathbb P\\biggl(\\frac{\\sqrt n}{\\sigma_n} \\bigl|\\overline{X}_n - \\mathbb E[X_1]\\bigr| \\le \\delta \\biggr)\n", "\\longrightarrow \\mathbb P \\bigl(|\\mathcal{N}(0,1)|\\le \\delta \\bigr) = 2\\int_0^{\\delta} \\frac{e^{-x^2/2}}{\\sqrt{2\\pi}} dx\n", "$$\n", "as $n \\to \\infty$.\n", "\n", "When $\\delta=1.96$, the last term on the right hand side is approximately equal to $0.95$, which allows to identify $I_n^{1.96}$ as the $95\\%$ _asymptotic_ confidence interval for the true value of the expectation $\\mathbb E[X_1]$.\n", "\n", "In the cell below: give an estimate of the expectation of a random variable uniformly distributed on $[0,1]$ with the $95\\%$ _asymptotic_ confidence interval obtained above." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "N = 1000\n", "X = np.random.rand(N)\n", "\n", "empMean = np.sum(X) / N\n", "\n", "################################################################\n", "# Stock in the variable 'mean_of_squares' the empirical mean\n", "# of the X_i^2\n", "#\n", "mean_of_squares = ???\n", "#\n", "# Stock in the variable 'emp_variance' the empirical variance,\n", "# and in radius_CI the half lenght of the 95% confidence interval\n", "# for the expectation of X\n", "#\n", "emp_variance = ???\n", "# \n", "radius_CI = ???\n", "# \n", "########\n", "\n", "# Theoretical true value for comparison\n", "esp = 0.5\n", "\n", "print(\"True expectation: %1.4f \\n\" %esp)\n", "print(\"Empirical mean: %1.4f \\n\" %empMean)\n", "\n", "print(\"0.95 Confidence interval: [%1.4f, %1.4f] \\n\" %(empMean - radius_CI, empMean + radius_CI) )\n", "\n", "print(\"Relative error*100: %1.1f\" %(radius_CI/empMean * 100) )" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.5" } }, "nbformat": 4, "nbformat_minor": 2 }