First part

Exercise 2

Joint distribution

On this exercise, we are going to use results from the first exercise. In fact, we can do an analogy between \({X_1\sim\mathcal{N}_p(\mu,\Sigma)}\) in the first exercise and \({X \sim\mathcal{N}_2(\begin{pmatrix}1\\2\end{pmatrix},\begin{pmatrix}2 & 1\\1 & 2\end{pmatrix})}\),

where \(\mu = \begin{pmatrix} 1\\2 \end{pmatrix}\), \(\Sigma = \begin{pmatrix} 2&1\\1&2 \end{pmatrix}\) and \(p=2\).

Similarly, \((X_2|X_1=x_1) \sim \mathcal{N}_{p-q}(\mathcal{A}x + b, \Omega)\) is similar to \((Y|X) \sim\mathcal{N}(\begin{pmatrix}X_1\\X_1+X_2\end{pmatrix},\begin{pmatrix}1 & 0\\0 & 1\end{pmatrix})\),

where \(\mathcal{A} = \begin{pmatrix} 1&0\\1&1 \end{pmatrix}\), \(b = 0\), \(\Omega = \begin{pmatrix} 1&0\\0&1 \end{pmatrix}\) and \(q=2\).

Therefore, using the first exercise’s results, we assume that : \((X\ Y)^T \sim \mathcal{N}(\begin{pmatrix} \mu \\ \mathcal{A}\mu + b \end{pmatrix}, \begin{pmatrix} \Sigma & \Sigma \mathcal{A}^T \\ \mathcal{A} \Sigma & \Omega + \mathcal{A} \Sigma \mathcal{A} \end{pmatrix})\)

After calculations, it comes that \((X\ Y)^T = (X_1 \ X_2 \ Y_1 \ Y_2)^T \sim \mathcal{N}_4 (\begin{pmatrix} 1 \\ 2 \\ 1 \\3 \end{pmatrix}, \begin{pmatrix} 2 &1&2&3\\1&2&1&3\\2&1&3&3\\3&3&3&7 \end{pmatrix})\)

Distribution of \(Y_2\) giving \(Y_1\)

Therefore, we can find the conditional distribution of \(Y_2\) giving \(Y_1\) such as :

\[ (Y_2 | Y_1 = y_1) \sim \mathcal{N}(\mu_4 + \Sigma_{34}\Sigma_{33}(y_1 - \mu_3),\Sigma_{33}\Sigma_{44}^{-1}\Sigma_{43}) \] \[ (Y_2 | Y_1 = y_1) \sim \mathcal{N}(3 + (y_1 - 1),4) \]

Distribution of \(W = X - Y\)

Using the joint distribution, we know that \(Y \sim \mathcal{N}(\begin{pmatrix} 1\\3\end{pmatrix}, \begin{pmatrix} 3&3\\3&7\end{pmatrix})\)

Then \(E[W] = E[X-Y] = E[X] - E[Y] = \begin{pmatrix} 0\\-1 \end{pmatrix}\)

On the other hand, \(Var[W] = Var[X-Y] = Var(X) + Var(Y) - 2 Cov(X,Y)\) \(Var[W] = Var(X) + Var(Y) - (\Sigma_{12} + \Sigma_{21})\) \(Var[W] = \begin{pmatrix} 2&1\\1&2 \end{pmatrix} + \begin{pmatrix} 3&3\\3&7 \end{pmatrix} - (\begin{pmatrix} 2&3\\1&3 \end{pmatrix} + \begin{pmatrix} 2&1\\3&3 \end{pmatrix})\)

\(Var[W] = \begin{pmatrix} 1&0\\0&3 \end{pmatrix}\)

Empirical verification

Sigma_XY = matrix(c(2,1,2,3,1,2,1,3,2,1,3,3,3,3,3,7), nrow = 4)
Sigma_XY
##      [,1] [,2] [,3] [,4]
## [1,]    2    1    2    3
## [2,]    1    2    1    3
## [3,]    2    1    3    3
## [4,]    3    3    3    7
mu_XY = c(1,2,1,3)
mu_XY
## [1] 1 2 1 3
XY = rmvnorm(5000,mu_XY,Sigma_XY)
X = XY[,1:2]
shapiro.test(X[1:5000])
## 
##  Shapiro-Wilk normality test
## 
## data:  X[1:5000]
## W = 0.99977, p-value = 0.8828
Y = XY[,3:4]
shapiro.test(Y[1:5000])
## 
##  Shapiro-Wilk normality test
## 
## data:  Y[1:5000]
## W = 0.99936, p-value = 0.07392
W = X - Y
round(mean(W[,1]),2)
## [1] -0.02
round(mean(W[,2]),2)
## [1] -0.95
round(cov(W),2)
##       [,1]  [,2]
## [1,]  0.97 -0.03
## [2,] -0.03  3.00
shapiro.test(W[1:5000])
## 
##  Shapiro-Wilk normality test
## 
## data:  W[1:5000]
## W = 0.99954, p-value = 0.2734
Sigma_Y = Sigma_XY[3:4,3:4]

x <- seq(-5,5,0.01)
contour(x,x,outer(x,x,function(x,y){dmvnorm(cbind(x,y),sigma=Sigma_Y)}), col = "blue")

abline(v=.7, lwd=2, lty=2, col = "red")
abline(v=4, lwd=2, lty=2, col = "grey")
abline(v=-3, lwd=2, lty=2, col = "purple")
text(0.75, -2, labels=expression(y[1]==0.7), col = "red", pos = 4)
text(4.05, -2, labels=expression(y[2]==4), col = "grey", pos = 4)
text(-3.05, -2, labels=expression(y[3]==-3), col = "purple", pos = 4)


### conditional distribution of Y2 | Y1 = 0.7
y <- dnorm(x, mean =  3 + (0.7-1), sd = sqrt(4))
lines(y-abs(min(x)),x,lty=2,lwd=2, col = "red")

### conditional distribution of Y2 | Y1 = 4
y2 <- dnorm(x, mean =  3 + (4-1), sd = sqrt(4))
lines(y2-abs(min(x)),x,lty=2,lwd=2, col = "grey")

### conditional distribution of Y2 | Y1 = -3
y3 <- dnorm(x, mean =  3 + (-3-1), sd = sqrt(4))
lines(y3-abs(min(x)),x,lty=2,lwd=2, col = "purple")

Second part

Theorical analysis

We’re going to demonstrate that even if two random variable follow a normal distribution, their joint distribution does not follow a normal distribution.

Let \(X\) be a random variable following a scaled-centered normal distribution such as \(X \sim \mathcal{N}(0,1)\).

We introduce \(\mathcal{E}\), which is independant from \(X\), such as : \(P(\mathcal{E} = -1) = 1/2\) and \(P(\mathcal{E} = 1) = 1/2\).

Finally, we introduce \(Y\) such as \(Y = \mathcal{E}X\). Let’s demonstrate that \(Y\) follows a normal distribution.

\(\forall t \in \mathcal{R},\) \(P(Y \leq t) = P(\mathcal{E}X \leq t)\) Using the formula of the total probabilities and the independance between \(X\) and \(\mathcal{E}\) , we have : \(P(Y\leq t=) = P(\mathcal{E} = -1)P(-X\leq t) + P(\mathcal{E} = 1)P(X \leq t)\) \(P(Y\leq t=) = \frac{1}{2} (P(-X\leq t) + P(X \leq t))\) Because \(X\) and \(-X\) follow the same distribution : \(P(Y\leq t=) = \frac{1}{2}(2 * P(X \leq t))\) \(P(Y\leq t=) = P(X \leq t)\)

Then, \(Y\) follows a scaled-centered normal distribution as well.

Finally, we know that if \((X \ Y)\) follow a normal distribution, any linear combinaisons of this vector must follow a normal distribution.

However, \(P(X+Y = 0) = P((1+\mathcal{E})X = 0) = P(\mathcal{E} = -1) = 1/2\)

This shows that \(X+Y\) can not follow a normal distribution so \((X \ Y)\) can not be a vector following a normal distribution due to the definition above.

Empirical verification

First of all, we create \(X \sim \mathcal{N}(0,1)\). Then, we construct \(\mathcal{E}\) and see what it looks like. Finally, we construct \(Y = \mathcal{E}*X\).

The “shapiro.test” is a R function which tests if an element, object, vector follows a normal distribution. A high p-value implies that the vector follows the normal distribution. In this case, the Y variable does. We can verify it by plotting its density.

n=1000
X<-rnorm(n,0,1)
eps<-rbinom(n,1,1/2)
eps[which(eps==0)]<- -1
head(eps)
## [1]  1  1  1 -1  1  1
Y<-(eps)*X

shapiro.test(Y) # normal ok
## 
##  Shapiro-Wilk normality test
## 
## data:  Y
## W = 0.99866, p-value = 0.6604
Y2 <- seq(min(Y), max(Y), length = 40)

# Normal curve
fun <- dnorm(Y2, mean = mean(Y), sd = sd(Y))

# Histogram
hist(Y, prob = TRUE, col = "white",
     ylim = c(0, max(fun)),
     main = "Histogram with normal curve")
lines(Y2, fun, col = 2, lwd = 2) 

Now, we can plot the density of \(X+Y\) to verify it does not follow a normal distribution.

Finally, we can perform the Shapiro test to confirm our assumptions. It gives a very low p-value which means \(X+Y\) does not follow the normal distribution. Therefore (X  Y)$ does not either.

X_plus_Y = X+Y

X_plus_Y2 <- seq(min(X_plus_Y), max(X_plus_Y), length = 40)

# Normal curve
fun <- dnorm(X_plus_Y2, mean = mean(X_plus_Y), sd = sd(X_plus_Y))

# Histogram
hist(X_plus_Y, prob = TRUE, col = "white",
     ylim = c(0, max(fun)),
     main = "Histogram with normal curve")
lines(X_plus_Y2, fun, col = 2, lwd = 2) 

shapiro.test(X_plus_Y)
## 
##  Shapiro-Wilk normality test
## 
## data:  X_plus_Y
## W = 0.86519, p-value < 2.2e-16