First part

Exercise 2

Joint distribution

On this exercise, we are going to use results from the first exercise. In fact, we can do an analogy between ${X_1\sim\mathcal{N}_p(\mu,\Sigma)}$ in the first exercise and ${X \sim\mathcal{N}_2(\begin{pmatrix}1\\2\end{pmatrix},\begin{pmatrix}2 & 1\\1 & 2\end{pmatrix})}$,

where $\mu = \begin{pmatrix} 1\\2 \end{pmatrix}$, $\Sigma = \begin{pmatrix} 2&1\\1&2 \end{pmatrix}$ and $p=2$.

Similarly, $(X_2|X_1=x_1) \sim \mathcal{N}_{p-q}(\mathcal{A}x + b, \Omega)$ is similar to $(Y|X) \sim\mathcal{N}(\begin{pmatrix}X_1\\X_1+X_2\end{pmatrix},\begin{pmatrix}1 & 0\\0 & 1\end{pmatrix})$,

where $\mathcal{A} = \begin{pmatrix} 1&0\\1&1 \end{pmatrix}$, $b = 0$, $\Omega = \begin{pmatrix} 1&0\\0&1 \end{pmatrix}$ and $q=2$.

Therefore, using the first exercise’s results, we assume that : $(X\ Y)^T \sim \mathcal{N}(\begin{pmatrix} \mu \\ \mathcal{A}\mu + b \end{pmatrix}, \begin{pmatrix} \Sigma & \Sigma \mathcal{A}^T \\ \mathcal{A} \Sigma & \Omega + \mathcal{A} \Sigma \mathcal{A} \end{pmatrix})$

After calculations, it comes that $(X\ Y)^T = (X_1 \ X_2 \ Y_1 \ Y_2)^T \sim \mathcal{N}_4 (\begin{pmatrix} 1 \\ 2 \\ 1 \\3 \end{pmatrix}, \begin{pmatrix} 2 &1&2&3\\1&2&1&3\\2&1&3&3\\3&3&3&7 \end{pmatrix})$

Distribution of $Y_2$ giving $Y_1$

Therefore, we can find the conditional distribution of $Y_2$ giving $Y_1$ such as :

\[ (Y_2 | Y_1 = y_1) \sim \mathcal{N}(\mu_4 + \Sigma_{34}\Sigma_{33}(y_1 - \mu_3),\Sigma_{33}\Sigma_{44}^{-1}\Sigma_{43}) \] \[ (Y_2 | Y_1 = y_1) \sim \mathcal{N}(3 + (y_1 - 1),4) \]

Distribution of $W = X - Y$

Using the joint distribution, we know that $Y \sim \mathcal{N}(\begin{pmatrix} 1\\3\end{pmatrix}, \begin{pmatrix} 3&3\\3&7\end{pmatrix})$

Then $E[W] = E[X-Y] = E[X] - E[Y] = \begin{pmatrix} 0\\-1 \end{pmatrix}$

On the other hand, $Var[W] = Var[X-Y] = Var(X) + Var(Y) - 2 Cov(X,Y)$ $Var[W] = Var(X) + Var(Y) - (\Sigma_{12} + \Sigma_{21})$ $Var[W] = \begin{pmatrix} 2&1\\1&2 \end{pmatrix} + \begin{pmatrix} 3&3\\3&7 \end{pmatrix} - (\begin{pmatrix} 2&3\\1&3 \end{pmatrix} + \begin{pmatrix} 2&1\\3&3 \end{pmatrix})$

$Var[W] = \begin{pmatrix} 1&0\\0&3 \end{pmatrix}$

Empirical verification

Sigma_XY = matrix(c(2,1,2,3,1,2,1,3,2,1,3,3,3,3,3,7), nrow = 4)
Sigma_XY

##      [,1] [,2] [,3] [,4]
## [1,]    2    1    2    3
## [2,]    1    2    1    3
## [3,]    2    1    3    3
## [4,]    3    3    3    7

mu_XY = c(1,2,1,3)
mu_XY

## [1] 1 2 1 3

XY = rmvnorm(5000,mu_XY,Sigma_XY)
X = XY[,1:2]
shapiro.test(X[1:5000])

## 
##  Shapiro-Wilk normality test
## 
## data:  X[1:5000]
## W = 0.99977, p-value = 0.8828

Y = XY[,3:4]
shapiro.test(Y[1:5000])

## 
##  Shapiro-Wilk normality test
## 
## data:  Y[1:5000]
## W = 0.99936, p-value = 0.07392

W = X - Y
round(mean(W[,1]),2)

## [1] -0.02

round(mean(W[,2]),2)

## [1] -0.95

round(cov(W),2)

##       [,1]  [,2]
## [1,]  0.97 -0.03
## [2,] -0.03  3.00

shapiro.test(W[1:5000])

## 
##  Shapiro-Wilk normality test
## 
## data:  W[1:5000]
## W = 0.99954, p-value = 0.2734

Sigma_Y = Sigma_XY[3:4,3:4]

x <- seq(-5,5,0.01)
contour(x,x,outer(x,x,function(x,y){dmvnorm(cbind(x,y),sigma=Sigma_Y)}), col = "blue")

abline(v=.7, lwd=2, lty=2, col = "red")
abline(v=4, lwd=2, lty=2, col = "grey")
abline(v=-3, lwd=2, lty=2, col = "purple")
text(0.75, -2, labels=expression(y[1]==0.7), col = "red", pos = 4)
text(4.05, -2, labels=expression(y[2]==4), col = "grey", pos = 4)
text(-3.05, -2, labels=expression(y[3]==-3), col = "purple", pos = 4)


### conditional distribution of Y2 | Y1 = 0.7
y <- dnorm(x, mean =  3 + (0.7-1), sd = sqrt(4))
lines(y-abs(min(x)),x,lty=2,lwd=2, col = "red")

### conditional distribution of Y2 | Y1 = 4
y2 <- dnorm(x, mean =  3 + (4-1), sd = sqrt(4))
lines(y2-abs(min(x)),x,lty=2,lwd=2, col = "grey")

### conditional distribution of Y2 | Y1 = -3
y3 <- dnorm(x, mean =  3 + (-3-1), sd = sqrt(4))
lines(y3-abs(min(x)),x,lty=2,lwd=2, col = "purple")

Second part

Theorical analysis

We’re going to demonstrate that even if two random variable follow a normal distribution, their joint distribution does not follow a normal distribution.

Let $X$ be a random variable following a scaled-centered normal distribution such as $X \sim \mathcal{N}(0,1)$.

We introduce $\mathcal{E}$, which is independant from $X$, such as : $P(\mathcal{E} = -1) = 1/2$ and $P(\mathcal{E} = 1) = 1/2$.

Finally, we introduce $Y$ such as $Y = \mathcal{E}X$. Let’s demonstrate that $Y$ follows a normal distribution.

$\forall t \in \mathcal{R},$ $P(Y \leq t) = P(\mathcal{E}X \leq t)$ Using the formula of the total probabilities and the independance between $X$ and $\mathcal{E}$ , we have : $P(Y\leq t=) = P(\mathcal{E} = -1)P(-X\leq t) + P(\mathcal{E} = 1)P(X \leq t)$ $P(Y\leq t=) = \frac{1}{2} (P(-X\leq t) + P(X \leq t))$ Because $X$ and $-X$ follow the same distribution : $P(Y\leq t=) = \frac{1}{2}(2 * P(X \leq t))$ $P(Y\leq t=) = P(X \leq t)$

Then, $Y$ follows a scaled-centered normal distribution as well.

Finally, we know that if $(X \ Y)$ follow a normal distribution, any linear combinaisons of this vector must follow a normal distribution.

However, $P(X+Y = 0) = P((1+\mathcal{E})X = 0) = P(\mathcal{E} = -1) = 1/2$

This shows that $X+Y$ can not follow a normal distribution so $(X \ Y)$ can not be a vector following a normal distribution due to the definition above.

Empirical verification

First of all, we create $X \sim \mathcal{N}(0,1)$. Then, we construct $\mathcal{E}$ and see what it looks like. Finally, we construct $Y = \mathcal{E}*X$.

The “shapiro.test” is a R function which tests if an element, object, vector follows a normal distribution. A high p-value implies that the vector follows the normal distribution. In this case, the Y variable does. We can verify it by plotting its density.

n=1000
X<-rnorm(n,0,1)
eps<-rbinom(n,1,1/2)
eps[which(eps==0)]<- -1
head(eps)

## [1]  1  1  1 -1  1  1

Y<-(eps)*X

shapiro.test(Y) # normal ok

## 
##  Shapiro-Wilk normality test
## 
## data:  Y
## W = 0.99866, p-value = 0.6604

Y2 <- seq(min(Y), max(Y), length = 40)

# Normal curve
fun <- dnorm(Y2, mean = mean(Y), sd = sd(Y))

# Histogram
hist(Y, prob = TRUE, col = "white",
     ylim = c(0, max(fun)),
     main = "Histogram with normal curve")
lines(Y2, fun, col = 2, lwd = 2)

Now, we can plot the density of $X+Y$ to verify it does not follow a normal distribution.

Finally, we can perform the Shapiro test to confirm our assumptions. It gives a very low p-value which means $X+Y$ does not follow the normal distribution. Therefore (X Y)$ does not either.

X_plus_Y = X+Y

X_plus_Y2 <- seq(min(X_plus_Y), max(X_plus_Y), length = 40)

# Normal curve
fun <- dnorm(X_plus_Y2, mean = mean(X_plus_Y), sd = sd(X_plus_Y))

# Histogram
hist(X_plus_Y, prob = TRUE, col = "white",
     ylim = c(0, max(fun)),
     main = "Histogram with normal curve")
lines(X_plus_Y2, fun, col = 2, lwd = 2)

shapiro.test(X_plus_Y)

## 
##  Shapiro-Wilk normality test
## 
## data:  X_plus_Y
## W = 0.86519, p-value < 2.2e-16

3rd assignment

Bastien Cadiou

03/03/2022

First part

Exercise 2

Joint distribution

Distribution of \(Y_2\) giving \(Y_1\)

Distribution of \(W = X - Y\)

Empirical verification

Second part

Theorical analysis

Empirical verification