# Probabilistic system analysis – Part III

## Limit theorem

Chebyshev’s inequality

\begin{align}
{\sigma ^2} &= \int {{{\left( {x – \mu } \right)}^2}{f_X}\left( x \right)dx} \\
&\ge \int_{ – \infty }^{u – c} {{{\left( {x – \mu } \right)}^2}{f_X}\left( x \right)dx} + \int_{u + c}^\infty {{{\left( {x – \mu } \right)}^2}{f_X}\left( x \right)dx} \\
&\ge {c^2}\int_{ – \infty }^{u – c} {{f_X}\left( x \right)dx} {\rm{ + }}{c^2}\int_{u + c}^\infty {{f_X}\left( x \right)dx} \\
&={c^2}{\rm{P}}\left( {\left| {X – \mu } \right| \ge c} \right)
\end{align}

${\rm{P}}\left( {\left| {X – \mu } \right| \ge c} \right) \le \frac{{{\sigma ^2}}}{{{c^2}}} \to {\rm{P}}\left( {\left| {X – \mu } \right| \ge k\sigma } \right) \le \frac{1}{{{k^2}}}$

Convergence

${M_n} = \frac{{{X_1} + \cdots + {X_n}}}{n}$

\begin{align}
{\rm{E}}\left[ {{M_n}} \right] &= \frac{{{\rm{E}}\left[ {{X_1}} \right] + \cdots + {\rm{E}}\left[ {{X_n}} \right]}}{n} = \frac{{n\mu }}{n} = \mu \\
{\mathop{\rm var}} \left( {{M_n}} \right) &= \frac{{{{\left( {{X_1} – \mu } \right)}^2} + \cdots + {{\left( {{X_n} – \mu } \right)}^2}}}{{{n^2}}} = \frac{{n{\sigma ^2}}}{{{n^2}}} = \frac{{{\sigma ^2}}}{n} \to 0\\
{\rm{P}}\left( {\left| {{M_n} – \mu } \right| \ge \varepsilon } \right) &\le \frac{{{\mathop{\rm var}} \left( {{M_n}} \right)}}{{{\varepsilon ^2}}} = \frac{{{\sigma ^2}}}{{n{\varepsilon ^2}}} \to 0
\end{align}

Example

Bernoulli process ${\sigma _x} = p\left( {1 – p} \right) = \frac{1}{4}$

95% confidence of error smaller than 1% ${\rm{P}}\left( {\left| {{M_n} – \mu } \right| \ge 0.01} \right) \le 0.05$

\begin{align}
{\rm{P}}\left( {\left| {{M_n} – \mu } \right| \ge 0.01} \right) &\le \frac{{\sigma _{{M_n}}^2}}{{{{0.01}^2}}} = \frac{{\sigma _x^2}}{{{{0.01}^2}n}}\\
&\le \frac{1}{{4n{{\left( {0.01} \right)}^2}}} \le 0.05
\end{align}

Scaling of $M_n$

\begin{align}
{S_n} = {X_1} + \cdots + {X_n} &\to {\mathop{\rm var}} \left( {{S_n}} \right) = n{\sigma ^2}\\
{M_n} = \frac{{{S_n}}}{n} &\to {\mathop{\rm var}} \left( {{M_n}} \right) = \frac{{{\sigma ^2}}}{n}\\
\frac{{{S_n}}}{{\sqrt n }} &\to {\mathop{\rm var}} \left( {\frac{{{S_n}}}{{\sqrt n }}} \right) = \frac{{{\mathop{\rm var}} \left( {{S_n}} \right)}}{n} = {\sigma ^2}
\end{align}

Central limit theorem

Standardized ${S_n} = {X_1} + \cdots + {X_n}$

${Z_n} = \frac{{{S_n} – {\rm{E}}\left[ {{S_n}} \right]}}{{{\sigma _{{S_n}}}}} = \frac{{{S_n} – n{\rm{E}}\left[ X \right]}}{{\sqrt n \sigma }} \sim N\left( {0,1} \right)$

${\rm{P}}\left( {{Z_n} \le c} \right) \to {\rm{P}}\left( {Z \le c} \right) = \Phi \left( c \right)$

CDF of ${Z_n}$ converges to normal CDF, not about convergence of PDF or PMF

Example:

$n = 36,p = 0.5$, find ${\rm{P}}\left( {{S_n} \le 21} \right)$

$\sum\limits_{k = 0}^{21} {\left( {\begin{array}{*{20}{c}} {36}\\ k \end{array}} \right)} {\left( {\frac{1}{2}} \right)^{36}} = 0.8785$

Method 1: Central limit theorem

${Z_n} = \frac{{{S_n} – n{\rm{E}}\left[ X \right]}}{{\sqrt n \sigma }} = \frac{{{S_n} – np}}{{\sqrt {np\left( {1 – p} \right)} }}$

\begin{align}
{\rm{E}}\left[ {{S_n}} \right] &= np = 36 \cdot 0.5 = 18\\
\sigma _{{S_n}}^2 &= np\left( {1 – p} \right) = 9\\
{\rm{P}}\left( {{S_n} \le 21} \right) &\approx {\rm{P}}\left( {\frac{{{S_n} – 18}}{3} \le \frac{{21 – 18}}{3}} \right) = {\rm{P}}\left( {{Z_n} \le 1} \right) = 0.843
\end{align}

Method 2: 1/2 correction for binomial approximation

\begin{align}
{\rm{P}}\left( {{S_n} \le 21} \right) &= {\rm{P}}\left( {{S_n} < 22} \right)\ \ \ {S_n}{\rm{\ is\ an\ integer}}\\
{\rm{P}}\left( {{S_n} \le 21.5} \right) &\approx {\rm{P}}\left( {{Z_n} \le \frac{{21.5 – 18}}{3}} \right) = {\rm{P}}\left( {{Z_n} \le 1.17} \right) = 0.879
\end{align}

The solution is closer to the exact answer.

De Moivre-Laplace CLT

When 1/2 correction is used, CLT can also approximate the binomial pmf, not just the binomial CDF.

\begin{align}
{\rm{P}}\left( {{S_n} = 19} \right) &= {\rm{P}}\left( {18.5 \le {S_n} \le 19.5} \right) \approx {\rm{P}}\left( {0.17 \le {Z_n} \le 0.5} \right)\\
&= {\rm{P}}\left( {{Z_n} \le 0.5} \right) – {\rm{P}}\left( {{Z_n} \le 0.17} \right) = 0.124
\end{align}

Exact answer: ${\rm{P}}\left( {{S_n} = 19} \right) = \left( {\begin{array}{*{20}{c}} {36}\\ {19} \end{array}} \right){\left( {\frac{1}{2}} \right)^{36}} = 0.125$

Poisson arrivals during unit interval equals: sum of $n$ (independent) Poisson arrivals during $n$ intervals of length $1/n$. $X = {X_1} + \cdots + {X_n}$, ${\rm{E}}\left[ X \right] = 1$. Fix $p$ of $X_i$, when $n \to \infty$, the mean and variance are changing, so CLT cannot be applied here. For Binomial$\left( {n,p} \right)$:

\begin{align}
p \ {\rm{fixed}},n &\to \infty : \ {\rm{normal}}\\
np \ {\rm{fixed}},n &\to \infty : \ {\rm{Poisson}}
\end{align}

## Bayesian statistical inference

Types of Inference models/approaches

$X=aS+W$

#1. Model building: know “signal” $S$, observe $X$, infer $a$

#2. Inferring unknown variables: know $a$, observe $X$, infer $S$

#3. Hypothesis testing: unknown takes one of few possible values, aim at small probability of incorrect decision

#4. Estimation: aim at a small estimation error

$\theta$ is an unknown parameter and the difference between classical statistics and Bayesian is that $\theta$ is constant in classical statistics, but a random variable in Bayesian.

\begin{align}
{p_{\Theta \left| X \right.}}\left( {\theta \left| x \right.} \right) = \frac{{{p_\Theta }\left( \theta \right){f_{\Theta \left| X \right.}}\left( {\theta \left| x \right.} \right)}}{{{f_X}\left( x \right)}}& \qquad {\rm{Hypothesis \ testing}}\\
{f_{\Theta \left| X \right.}}\left( {\theta \left| x \right.} \right) = \frac{{{f_\Theta }\left( \theta \right){f_{\Theta \left| X \right.}}\left( {\theta \left| x \right.} \right)}}{{{f_X}\left( x \right)}}& \qquad {\rm{Estimation}}
\end{align}

Maximum a posteriori probability (MAP): choose a single value that gives a maximum probability, often used in hypothesis testing, but may be misleading when look into the conditional expectation.

Least mean squares estimation (LMS): find $c$ to minimize ${\rm{E}}\left[ {{{\left( {\Theta – c} \right)}^2}} \right]$, in which $c = {\rm{E}}\left[ \Theta \right]$, then the optimal mean squared error is ${\mathop{\rm var}} \left( \Theta \right) = {\rm{E}}\left[ {{{\left( {\Theta – {\rm{E}}\left[ \Theta \right]} \right)}^2}} \right]$, summarized as

${\rm{minimize \ E}}\left[ {{{\left( {\Theta – c} \right)}^2}} \right] \to c = {\rm{E}}\left[ \Theta \right] \to {\mathop{\rm var}} \left( \Theta \right) = {\rm{E}}\left[ {{{\left( {\Theta – {\rm{E}}\left[ \Theta \right]} \right)}^2}} \right]$

\begin{align}
{\rm{E}}\left[ {{{\left( {\Theta – c} \right)}^2}} \right] &= {\rm{E}}\left[ {{\Theta ^2}} \right] – 2c{\rm{E}}\left[ \Theta \right] + {c^2}\\
\frac{d}{{dc}}{\rm{E}}\left[ {{{\left( {\Theta – c} \right)}^2}} \right] &= – 2{\rm{E}}\left[ \Theta \right] + 2c = 0
\end{align}

LMS estimation of two random variables

\begin{align}
{\rm{minimize \ E}}\left[ {{{\left( {\Theta – c} \right)}^2}\left| {X = x} \right.} \right] &\to c = {\rm{E}}\left[ {\Theta \left| {X = x} \right.} \right]\\
{\mathop{\rm var}} \left( {\Theta \left| {X = x} \right.} \right) = {\rm{E}}\left[ {{{\left( {\Theta – {\rm{E}}\left[ {\Theta \left| {X = x} \right.} \right]} \right)}^2}\left| {X = x} \right.} \right] &\le {\rm{E}}\left[ {{{\left( {\Theta – g\left( x \right)} \right)}^2}\left| {X = x} \right.} \right]\\
{\rm{E}}\left[ {{{\left( {\Theta – {\rm{E}}\left[ {\Theta \left| X \right.} \right]} \right)}^2}\left| X \right.} \right] &\le {\rm{E}}\left[ {{{\left( {\Theta – g\left( x \right)} \right)}^2}\left| X \right.} \right]\\
{\rm{E}}\left[ {{{\left( {\Theta – {\rm{E}}\left[ \Theta \right]} \right)}^2}} \right] &\le {\rm{E}}\left[ {{{\left( {\Theta – g\left( x \right)} \right)}^2}} \right]
\end{align}

LMS estimation with several measurements: ${\rm{E}}\left[ {\Theta \left| {{X_1}, \cdots ,{X_n}} \right.} \right]$, is hard to compute and involves multi-dimensional integrals, etc.

Estimator: $\widehat \Theta = {\rm{E}}\left[ {\Theta \left| X \right.} \right]$, with estimation error $\widetilde \Theta = \widehat \Theta – \Theta$

$\begin{array}{l} {\rm{E}}\left[ {\widetilde \Theta \left| X \right.} \right] = {\rm{E}}\left[ {\widehat \Theta – \Theta \left| X \right.} \right] = {\rm{E}}\left[ {\widehat \Theta \left| X \right.} \right] – {\rm{E}}\left[ {\Theta \left| X \right.} \right] = \widehat \Theta – \widehat \Theta = 0\\ {\rm{E}}\left[ {\widetilde \Theta h\left( x \right)\left| X \right.} \right] = h\left( x \right){\rm{E}}\left[ {\widetilde \Theta \left| X \right.} \right] = 0 \to {\rm{E}}\left[ {\widetilde \Theta h\left( x \right)} \right] = 0\\ {\mathop{\rm cov}} \left( {\widetilde \Theta ,\widehat \Theta } \right) = {\rm{E}}\left[ {\left( {\widetilde \Theta – {\rm{E}}\left[ {\widetilde \Theta } \right]} \right) \cdot \left( {\widehat \Theta – {\rm{E}}\left[ {\widehat \Theta } \right]} \right)} \right] = {\rm{E}}\left[ {\widetilde \Theta \cdot \left( {\widehat \Theta – \Theta } \right)} \right] = 0\\ \widetilde \Theta = \widehat \Theta – \Theta \to {\mathop{\rm var}} \left( \Theta \right) = {\mathop{\rm var}} \left( {\widehat \Theta } \right) + {\mathop{\rm var}} \left( {\widetilde \Theta } \right) \end{array}$

Linear LMS

Consider estimator of $\Theta$ of the form $\widehat \Theta = aX + b$ and minimize ${\rm{E}}\left[ {{{\left( {\Theta – aX – b} \right)}^2}} \right]$

${\widehat \Theta _L} = {\rm{E}}\left[ \Theta \right] + \frac{{{\mathop{\rm cov}} \left( {X,\Theta } \right)}}{{{\mathop{\rm var}} \left( X \right)}}\left( {X – {\rm{E}}\left[ X \right]} \right)$

${\rm{E}}\left[ {{{\left( {{{\widehat \Theta }_L} – \Theta } \right)}^2}} \right] = \left( {1 – {\rho ^2}} \right)\sigma _\Theta ^2$

Consider estimators of the form $\widehat \Theta = {a_1}{X_1} + \cdots + {a_n}{X_n} + b$ and minimize ${\rm{E}}\left[ {{{\left( {{a_1}{X_1} + \cdots + {a_n}{X_n} + b – \Theta } \right)}^2}} \right]$. Set derivatives to zero and linear system in $b$ and the $a_i$, only means, variances, covariances matter.

Cleanest linear LMS example

${X_i} = \Theta + {W_i}$, $\Theta ,{W_1}, \cdots ,{W_n}$ are independent, $\Theta \sim \mu ,\sigma _0^2$ and ${W_i} \sim 0,\sigma _i^2$, then

${\widehat \Theta _L} = \frac{{\mu /\sigma _0^2 + \sum\limits_{i = 1}^n {{X_i}/\sigma _i^2} }}{{\sum\limits_{i = 1}^n {1/\sigma _i^2} }}$

## Classical statistical inference

Problem type

#1: hypothesis testing ${H_0}:\theta = 1/2 \ {\rm{versus}} \ {H_1}:\theta = 3/4$

#2: composite hypotheses ${H_0}:\theta = 1/2 \ {\rm{versus}} \ {H_1}:\theta \ne 1/2$

#3: estimation, design an estimator $\widehat \Theta$ to keep estimation error $\widehat \Theta – \theta$ small

Maximum likelihood estimation: pick $\theta$ that makes data most likely

${\widehat \theta _{MAP}} = \arg \mathop {\max }\limits_\theta {p_\Theta }\left( {x;\theta } \right)$

Example: $\mathop {\max }\limits_\theta \mathop \Pi \limits_{i = 1}^n \theta {e^{ – \theta {x_i}}}$

\begin{align}
\mathop {\max }\limits_\theta \mathop \Pi \limits_{i = 1}^n \theta {e^{ – \theta {x_i}}} &\to \mathop {\max }\limits_\theta \left( {n\log \theta – \theta \sum\limits_{i = 1}^n {{x_i}} } \right)\\
\frac{n}{\theta } – \sum\limits_{i = 1}^n {{x_i}} = 0 &\to {\widehat \theta _{ML}} = \frac{n}{{{x_1} + \cdots + {x_n}}}
\end{align}

desirable properties of estimators

(1) unbiased: ${\rm{E}}\left[ {{{\widehat \Theta }_n}} \right] = \theta$

(2) consistent: ${\widehat \Theta _n} \to \theta$ in probability

(3) small mean squared error (MSE): ${\rm{E}}\left[ {{{\left( {\widehat \Theta – \theta } \right)}^2}} \right] = {\mathop{\rm var}} \left( {\widehat \Theta – \theta } \right) + \left( {{\rm{E}}\left[ {\widehat \Theta – \theta } \right]} \right) = {\mathop{\rm var}} \left( {\widehat \Theta } \right) + {\left( {{\rm{biased}}} \right)^2}$

Confidence intervals (CIs)

random interval $\left[ {\widehat \Theta _n^ – ,\widehat \Theta _n^ + } \right]$ with an $1 – \alpha$ confidence interval

${\rm{P}}\left( {\widehat \Theta _n^ – \le \theta \le \widehat \Theta _n^ + } \right) \ge 1 – \alpha$

CI in estimation of the mean ${\widehat \Theta _n} = \left( {{X_1} + \cdots + {X_n}} \right)/n$

$\Phi \left( {1.96} \right) = 1 – 0.05/2$

${\rm{P}}\left( {\frac{{\left| {{{\widehat \Theta }_n} – \theta } \right|}}{{\sigma /\sqrt n }} \le 1.96} \right) \approx 0.95$

${\rm{P}}\left( {{{\widehat \Theta }_n} – \frac{{1.96\sigma }}{{\sqrt n }} \le \theta \le {{\widehat \Theta }_n} + \frac{{1.96\sigma }}{{\sqrt n }}} \right) \approx 0.95$

more generally

$\Phi \left( z \right) = 1 – \alpha /2 \to {\rm{P}}\left( {{{\widehat \Theta }_n} – \frac{{z\sigma }}{{\sqrt n }} \le \theta \le {{\widehat \Theta }_n} + \frac{{z\sigma }}{{\sqrt n }}} \right) \approx 1 – \alpha$

Since the $\sigma$ is unknown, then we should estimate the value of $\sigma$

option 1: upper bound on $\sigma$, for example Bernoulli $\sigma = \sqrt {p\left( {1 – p} \right)} \le 1/2$

option 2: ad hoc estimate of $\sigma$, for example Bernoulli$\theta$, $\widehat \sigma = \sqrt {\widehat \Theta \left( {1 – \widehat \Theta } \right)}$

option 3: generic estimation of the variance

$\begin{array}{l} {\sigma ^2} = {\rm{E}}\left[ {{{\left( {{X_i} – \theta } \right)}^2}} \right]\\ \widehat \sigma _n^2 = \frac{1}{n}\sum\limits_{i = 1}^n {{{({X_i} – \theta )}^2}} \to {\sigma ^2}\\ \widehat S_n^2 = \frac{1}{{n – 1}}\sum\limits_{i = 1}^n {{{({X_i} – {{\widehat \Theta }_n})}^2}} \to {\sigma ^2} \end{array}$