The data

Once again, we use the uscomp dataset.

data(uscomp)
str(uscomp)
## 'data.frame':    79 obs. of  7 variables:
##  $ Assets      : int  19788 5074 13621 1117 1633 5651 5835 3494 1654 1679 ...
##  $ Sales       : Factor w/ 79 levels "1037","1038",..: 78 32 55 2 71 6 48 18 52 10 ...
##  $ Market Value: int  10636 1892 4572 478 679 2002 1601 1442 779 687 ...
##  $ Profits     : num  1092.9 239.9 485 59.7 74.3 ...
##  $ Cash Flow   : num  2576.8 578.3 898.9 91.7 135.9 ...
##  $ Employees   : num  79.4 21.9 23.4 3.8 2.8 6.2 10.8 6.4 1.6 4.6 ...
##  $ Sector      : Factor w/ 9 levels "Communication",..: 1 1 2 2 2 2 2 2 2 2 ...
#uscomp$Sales <- as.integer(uscomp$Sales)

On our previous PCA, we transformed the “Sales” variable into numerical. However, by inspecting the data, we realized that was not a good idea and even if we do not have clear explanations about this variable, we will let it be factorial.

Empirical correlations

Let’s use the corrplot library to visualize the correlations between covariates.

corrplot(cor(uscomp[,c(1,3,4,5,6)]), method = "number")

Once again, they are all positively correlated. Furthermore, we notice that Market Value, Profits and Cash Flow are highly correlated, with a coefficient near to one.

Factor Analysis

fa_1 <- factanal(uscomp[,c(1,3,4,5,6)], factors = 2, rotation = "varimax")
print(fa_1, digits=2, cutoff=.6, sort=TRUE)
## 
## Call:
## factanal(x = uscomp[, c(1, 3, 4, 5, 6)], factors = 2, rotation = "varimax")
## 
## Uniquenesses:
##       Assets Market Value      Profits    Cash Flow    Employees 
##         0.40         0.02         0.00         0.01         0.28 
## 
## Loadings:
##              Factor1 Factor2
## Market Value 0.79           
## Profits      0.90           
## Cash Flow    0.87           
## Assets               0.70   
## Employees            0.67   
## 
##                Factor1 Factor2
## SS loadings       2.58    1.70
## Proportion Var    0.52    0.34
## Cumulative Var    0.52    0.86
## 
## Test of the hypothesis that 2 factors are sufficient.
## The chi square statistic is 7.13 on 1 degree of freedom.
## The p-value is 0.00757
fa_2 <- factanal(uscomp[,c(1,3,4,5,6)], factors = 1, rotation = "varimax")
print(fa_2, digits=2, cutoff=.6, sort=TRUE)
## 
## Call:
## factanal(x = uscomp[, c(1, 3, 4, 5, 6)], factors = 1, rotation = "varimax")
## 
## Uniquenesses:
##       Assets Market Value      Profits    Cash Flow    Employees 
##         0.59         0.05         0.02         0.01         0.38 
## 
## Loadings:
## [1] 0.64 0.98 0.99 1.00 0.79
## 
##                Factor1
## SS loadings       3.96
## Proportion Var    0.79
## 
## Test of the hypothesis that 1 factor is sufficient.
## The chi square statistic is 39.27 on 5 degrees of freedom.
## The p-value is 2.09e-07

Both of the models are pertinent. However, to extract more information, we will use the model with 2 factors. This model contains 86% of the whole variance of the dataset. Moreover, we can distinguish the two factors. The first one seems to be related to variables about the economy such as “Market Value”, “Profits” or “Cash Flow”. On the other hand, the second factor seems to represent information about the company structure with variables such as “Assets”, “Employees”.

Graphical outputs

par(mfrow=c(1,1))
loadings = fa_1$loadings
plot(loadings[,1:2],type="n") 
text(loadings,labels=names(uscomp[,c(1,3,4,5,6)]),cex=.7, col="blue")

barplot(abs(loadings[,1]),  ylim = c(0,1), ylab = "Factor 1", cex.names = 0.8)

barplot(abs(loadings[,2]),  ylim = c(0,1), ylab = "Factor 2", cex.names = 0.8)

Once again, we can see how the variables are separated into two groups.

colnames(uscomp) <- c("Assets","Sales","MarketValue","Profits","CashFlow","Employees","Sector")

uscomp_fit = uscomp[,c(1,3,4,5,6)]
uscomp_fit=scale(uscomp_fit)
uscomp_fit <- as.data.frame(uscomp_fit)

We transform slightly the dataset by renaming the columns’ names and by centering and scaling the numerical variables.

#model <- ' economy  =~ MarketValue + Profits + CashFlow
#              structure =~ Assets + Employees'
#fit <- cfa(model, data=uscomp_fit)
#summary(fit, fit.measures=TRUE)
#graph_sem((fit))