{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Programme inspecteur modèles BNP Paribas - module Monte-Carlo methods - 12 décembre 2019"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# TP 1\n",
    "\n",
    "## Warm-up: Simulation of random variables with Python and fundamental limit theorems\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import matplotlib.pyplot as plt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Exercise 1. Simulation of a probability density with the rejection method\n",
    "\n",
    "We consider the probability distribution with support $[-2, 2]$ and density function\n",
    "$$\n",
    "f(x) = \\frac 1{2\\pi} \\sqrt{4 - x^2} 1_{x \\in [-2,2]} ,\n",
    "$$\n",
    "usually called the Wigner law.\n",
    "\n",
    "$\\blacktriangleright$ Implement a simulation method allowing to draw i.i.d. samples from the Wigner law. Simulate a large  number of samples, plot the histogram and compare with the density $f$.\n",
    "\n",
    "_Reminder on the rejection method_: in order to simulate a random variable with density $f$ of support $[x_0, x_1]$ and such that $0\\leq f\\leq M$, one can use a sequence of independent couples of random variables $(Z_k, U_k)_{k\\geq 1}$ such that, for every $k$, $Z_k$ and $U_k$ are independent and uniformly distributed respectively on $[x_0, x_1]$ and $[0, 1]$.\n",
    "\n",
    "If we set\n",
    "\n",
    "$$ \\nu = \\min\\{k\\geq 1 : \\, f(Z_k) > M U_k \\}, $$\n",
    "\n",
    "the random variable $Z_\\nu$ is distributed according to the density $f$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def wigner_density(x):\n",
    "    return np.sqrt(4 - x*x) / (2*np.pi)\n",
    "\n",
    "N = int(1.e4)\n",
    "\n",
    "########################################\n",
    "# To Do: complete with an upper bound M\n",
    "# for the density f\n",
    "########################################\n",
    "upper_bound = 0\n",
    "\n",
    "X = np.array([])\n",
    "\n",
    "for _ in range(N):\n",
    "    # There is no \"while ... do\" in Python\n",
    "    # A possible solution is given below\n",
    "    \n",
    "    ########################################\n",
    "    # To Do: sample the variables\n",
    "    # z according to uniform distribution over [-2,2]\n",
    "    # u according to uniform distribution over [0,1]\n",
    "    ########################################\n",
    "    z = ???\n",
    "    u = ??? \n",
    "    \n",
    "    v = upper_bound * u\n",
    "    \n",
    "    while v >= wigner_density(z):\n",
    "        #################################\n",
    "        # To Do: complete the code below\n",
    "        #################################\n",
    "        z = ???\n",
    "        v = ???\n",
    "\n",
    "    X = np.append(X, z)\n",
    "\n",
    "##############################\n",
    "# Plot the histogram\n",
    "##############################\n",
    "n_columns = 2 * int(N**(1/3))\n",
    "\n",
    "plt.hist( ??? , density=True, bins=n_columns, label=\"Empirical law\")\n",
    "    \n",
    "########################################\n",
    "# Plot the true density for comparison\n",
    "########################################\n",
    "x = np.linspace(-2., 2., 100)\n",
    "\n",
    "f_x = ???\n",
    "\n",
    "plt.plot( ???, ???, \"r\", label=\"Density of the Wigner law\", linewidth=1.5)\n",
    "\n",
    "plt.legend(loc='best')\n",
    "plt.title(\"Wigner law, N = %1.0e samples with rejection method\" %N)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Exercise 2. Law of large numbers.\n",
    "\n",
    "Let $(X_i)_{i\\ge1}$ be a sequence of i.i.d. random variables such that $\\mathbb{E}\\bigl[ \\bigl|X_1\\bigr| \\bigr]<\\infty$.\n",
    "We remind that the law of large numbers states that the sequence of _empirical means_\n",
    "$$\n",
    "\\hat m_n = \\frac1n \\sum_{i=1}^n X_i\n",
    "$$\n",
    "converges almost surely to $\\mathbb{E}[X_1]$ as $n\\to \\infty$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 1.1 Convergence of the sequence $\\hat m_n$ as $n$ becomes large\n",
    "\n",
    "Draw $n$ independent random samples uniformly distributed  $[0,1]$ and plot the sequence $n \\mapsto \\hat m_n$ for $n$ ranging from $1$ to $n$.\n",
    "\n",
    "The following functions from `numpy` can be used: `numpy.random.rand` to draw the $n$ samples from the uniform distribution on $[0,1]$, and the function `numpy.cumsum` to compute the cumulated sum up to $n$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "N = 1000 #sample size\n",
    "\n",
    "########\n",
    "# Complete with N draws from the uniform distribution on [0,1]\n",
    "#\n",
    "X = ???\n",
    "#\n",
    "########\n",
    "\n",
    "integers1toN = np.arange(1,N+1) # A numpy array containing the integers from 1 to N\n",
    "\n",
    "############\n",
    "# Stock in the variable 'empMean' the sequence of empirical means for n ranging\n",
    "# from 1 to N\n",
    "#\n",
    "empMean = ???\n",
    "#\n",
    "############\n",
    "\n",
    "############\n",
    "# Plotting\n",
    "############\n",
    "plt.plot(??? , ???, color=\"b\", label=\"Empirical mean\")\n",
    "\n",
    "# A horizontal line for the reference theoretical value\n",
    "plt.axhline(0.5, color=\"r\", label=\"True expectation\")\n",
    "\n",
    "plt.legend(loc=\"best\")\n",
    "\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 1.2 Another example\n",
    "\n",
    "The Cauchy distribution with parameter $a > 0$ is the probability distribution on $\\mathbb R$ with density function\n",
    "$$\n",
    "f(x) = \\frac a{\\pi(a^2+x^2)}.\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### $\\blacktriangleright$ Question (a):\n",
    "\n",
    "$F(x)=\\int_{-\\infty}^x f(y) dy $ is the cumulative distribution function (cdf) of the Cauchy distribution.\n",
    "Let $U$ be a r.v. uniformly distributed on $[0,1]$. \n",
    "\n",
    "What is the law of $F^{-1}(U)$?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### $\\blacktriangleright$ Question (b):\n",
    "\n",
    "Noticing that $F(x) = \\frac1{\\pi}\\arctan(\\frac xa)+\\frac12$, generate $N$ i.i.d. samples distributed according to the Cauchy distribution wit parameter $a=1$, still using the function `numpy.random.rand`.\n",
    "\n",
    "Plot the sequence of the empirical means $\\hat m_n$ for $n$ ranging from $1$ to $N$.\n",
    "\n",
    "What do you observe?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "N = 1000\n",
    "U = np.random.rand(N)\n",
    "\n",
    "a = 1.\n",
    "\n",
    "########\n",
    "# Stock in the variable X the N draws from the Cauchy distribution with parameter a,\n",
    "# exploiting the inverse of the cdf \n",
    "#\n",
    "X = ????\n",
    "#\n",
    "########\n",
    "\n",
    "integers1toN = np.arange(1,N+1)\n",
    "\n",
    "########\n",
    "# Stock in the variable 'empMean' the sequence of empirical means for n ranging\n",
    "# from 1 to N\n",
    "# \n",
    "empMean = ???\n",
    "########\n",
    "\n",
    "# Plot the sequence of the empirical means\n",
    "#\n",
    "plt.plot(...)\n",
    "#\n",
    "########"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Exercise 3. Central limit theorem and confidence intervals.\n",
    "\n",
    "Let $(X_i)_{i\\ge1}$ be a sequence of i.i.d. random variables with $\\mathbb E[X_1^2]<\\infty$, and denote $\\hat m_n = \\frac1n \\sum_{i=1}^n X_i$ the sequence of their empirical means.\n",
    "\n",
    "The central limit theorem states that the sequence of _renormalized errors_\n",
    "\n",
    "$$\n",
    "e_n = \\frac{\\sqrt n}{\\sqrt{Var(X_1)}} \\bigl(\\hat m_n - \\mathbb E[X_1] \\bigr)\n",
    "$$\n",
    "\n",
    "converges in law towards a Gaussian distribution $\\mathcal{N}(0,1)$ as $n \\to \\infty$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 2.1 Convergence in law of the error"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The goal of this first part is to display the closeness of the distribution of the renormalized error $e_n$ with the Gaussian distribution, when $n$ is sufficiently large.\n",
    " \n",
    "We consider a sequene of i.i.d. random variables with exponential distribution pf parameter $\\lambda = 2$.\n",
    "\n",
    "For fixed $n$, draw a sample of $M$ values $ \\bigl(\\hat m_n^j \\bigr)_{j=1,\\dots,M}$ of the empirical mean.\n",
    "\n",
    "Plot the histogram of the corresponding sample of values of the renormalized error\n",
    "\n",
    "$$\n",
    "e^j_n = \\sqrt{\\frac n{Var(X_1)}} \\bigl(\\hat m^j_n - \\mathbb E[X_1] \\bigr),\n",
    "\\qquad j = 1, \\dots, M\n",
    "$$\n",
    "\n",
    "and compare with the standard Gaussian distribution."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "n = 3000 # n draws of the exponential distribution to compute the empirical mean\n",
    "M = 5000 # number of samples of the empirical mean\n",
    "\n",
    "####################################################\n",
    "# n*M independent draws of the exponential distribution\n",
    "# with parameter lambda=2\n",
    "#\n",
    "lambd = 2\n",
    "\n",
    "# Pay attention to the arguments of the function random.exponential \n",
    "X = np.random.exponential(1/lambd, (M, n))\n",
    "####################################################\n",
    "\n",
    "# True known values of expectation and variance (as a benchmark)\n",
    "esp = 1/lambd\n",
    "var = (1/lambd)**2\n",
    "\n",
    "####################################################\n",
    "# Stock in the variable 'empMean_n' the sample of M independent \n",
    "# values of the empirical mean \\hat m_n \n",
    "#\n",
    "empMean = ???\n",
    "#\n",
    "# Stock in the variable 'renormError_n' the sample of M independent\n",
    "# values of the renormalized error \n",
    "#\n",
    "renormError_n = ????\n",
    "####################################################\n",
    "\n",
    "\n",
    "####################################################\n",
    "# Plotting\n",
    "plt.hist( ??? , density=\"True\", bins=int(np.sqrt(M)), label=\"erreur normalisee\")\n",
    "\n",
    "####################################################\n",
    "# Standard gaussian density for comparison\n",
    "x = np.linspace(-5, 5, 100)\n",
    "\n",
    "gaussianDensity = ????\n",
    "\n",
    "####################################################\n",
    "# Plotting\n",
    "plt.plot(x, gaussianDensity, color=\"red\", label=\"densite gaussienne\", linewidth=2.0)\n",
    "\n",
    "plt.legend(loc=\"best\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 2.2 Confidence intervals\n",
    "\n",
    "Contrary to the previous example, in practice one does not know the value $Var(X_1)$ (which cannot be used, then, to build the confidence intervals).\n",
    "\n",
    "Yet, it is possible to estimate $Var(X_1)$ with the same random sample used to estimate $\\mathbb E[X]$, using the empirical variance as an estimator:\n",
    "\n",
    "$$\n",
    "\\sigma_n^2 = \\frac1n \\sum_{i=1}^n X_i^2 - (\\hat m_n)^2\n",
    "$$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### $\\blacktriangleright$ Question (a):\n",
    "\n",
    "Which result allows to state that the random sequence\n",
    "$$\n",
    "\\frac{\\sqrt n}{\\sigma_n} \\left(\\hat m_n - \\mathbb E[X_1]\\right)\n",
    "$$\n",
    "converges in law towards $\\mathcal{N}(0,1)$, when $n \\to \\infty$?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### $\\blacktriangleright$ Question (b):\n",
    "\n",
    "Defining for $\\delta > 0$ the (random) interval\n",
    "$$\n",
    "I_n^{\\delta}=\\Bigl[\\overline{X}_n-\\delta\\frac{\\sigma_n}{\\sqrt n},\\ \\overline{X}_n+\\delta\\frac{\\sigma_n}{\\sqrt n}\\Bigr],\n",
    "$$\n",
    "the previous question entails that\n",
    "\n",
    "$$\n",
    "\\mathbb P\\Bigl(\\mathbb E[X_1] \\in I_n^{\\delta}\\Bigr)\n",
    "=\n",
    "\\mathbb P\\biggl(\\frac{\\sqrt n}{\\sigma_n} \\bigl|\\overline{X}_n - \\mathbb E[X_1]\\bigr| \\le \\delta \\biggr)\n",
    "\\longrightarrow \\mathbb P \\bigl(|\\mathcal{N}(0,1)|\\le \\delta \\bigr) = 2\\int_0^{\\delta} \\frac{e^{-x^2/2}}{\\sqrt{2\\pi}} dx\n",
    "$$\n",
    "as $n \\to \\infty$.\n",
    "\n",
    "When $\\delta=1.96$, the last term on the right hand side is approximately equal to $0.95$, which allows to identify $I_n^{1.96}$ as the $95\\%$ _asymptotic_ confidence interval for the true value of the expectation $\\mathbb E[X_1]$.\n",
    "\n",
    "In the cell below: give an estimate of the expectation of a random variable uniformly distributed on $[0,1]$ with the $95\\%$ _asymptotic_ confidence interval obtained above."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "N = 1000\n",
    "X = np.random.rand(N)\n",
    "\n",
    "empMean = np.sum(X) / N\n",
    "\n",
    "################################################################\n",
    "# Stock in the variable 'mean_of_squares' the empirical mean\n",
    "# of the X_i^2\n",
    "#\n",
    "mean_of_squares = ???\n",
    "#\n",
    "# Stock in the variable 'emp_variance' the empirical variance,\n",
    "# and in radius_CI the half lenght of the 95% confidence interval\n",
    "# for the expectation of X\n",
    "#\n",
    "emp_variance = ???\n",
    "# \n",
    "radius_CI = ???\n",
    "# \n",
    "########\n",
    "\n",
    "# Theoretical true value for comparison\n",
    "esp = 0.5\n",
    "\n",
    "print(\"True expectation: %1.4f \\n\" %esp)\n",
    "print(\"Empirical mean: %1.4f \\n\" %empMean)\n",
    "\n",
    "print(\"0.95 Confidence interval: [%1.4f, %1.4f] \\n\" %(empMean - radius_CI, empMean + radius_CI) )\n",
    "\n",
    "print(\"Relative error*100: %1.1f\" %(radius_CI/empMean * 100) )"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}