Correlation matrix for factor analysis. Data preparation

💖 Do you like it? Share the link with your friends

Having become familiar with the concepts of factor loading and the area of ​​joint changes, you can go further, again using the apparatus of matrices for presentation, the elements of which this time will be correlation coefficients.

The matrix of correlation coefficients, obtained, as a rule, experimentally, is called a correlation matrix, or correlation matrix.

The elements of this matrix are the correlation coefficients between all variables in a given population.

If we have, for example, a set consisting of tests, then the number of correlation coefficients obtained experimentally will be

These coefficients fill half of the matrix, located on one side of its main diagonal. On the other side there are, obviously, the same coefficients, since, etc. Therefore, the correlation matrix is ​​symmetrical.

Scheme 3.2. Full correlation matrix

There are ones on the diagonal of this matrix because the correlation of each variable with itself is +1.

A correlation matrix in which the elements of the main diagonal are equal to 1 is called the “full matrix” of correlation (Scheme 3.2) and is denoted

It should be noted that by placing the units, or correlations, of each variable with itself on the main diagonal, we are taking into account the total variance of each variable represented in the matrix. Thus, the influence of not only general, but also specific factors is taken into account.

On the contrary, if on the main diagonal of the correlation matrix there are elements corresponding to generalities and relating only to the general dispersion of variables, then the influence of only general factors is taken into account, the influence of specific factors and errors is eliminated, i.e. specificity and error dispersion are discarded.

A correlation matrix in which the elements of the main diagonal correspond to commonalities is called reduced and is denoted by R (Scheme 3.3).

Scheme 3.3. Reduced correlation matrix

We have already discussed factor loading, or the filling of a given variable with a specific factor. It was emphasized that factor loading has the form of a correlation coefficient between a given variable and a given factor.

A matrix whose columns consist of the loadings of a given factor in relation to all variables of a given population, and the rows of which consist of factor loadings of a given variable, is called a factor matrix, or factor matrix. Here we can also talk about a full and reduced factor matrix. The elements of the full factor matrix correspond to the total unit variance of each variable in a given population. If the loadings on general factors are denoted by c, and the loadings of specific factors by and, then the complete factor matrix can be represented in the following form:

Scheme 3.4. Full factor matrix for four variables

The factor matrix shown here has two parts. The first part contains items related to four variables and three general factors, all of which are assumed to apply to all variables. This is not a necessary condition, since some elements of the first part of the matrix may be equal to zero, which means that some factors do not apply to all variables. The elements of the first part of the matrix are the loadings of the common factors (for example, the element shows the loading of the second common factor on the first variable).

In the second part of the matrix we see 4 loadings of characteristic factors, one in each row, which corresponds to their characteristic nature. Each of these factors relates to only one variable. All other elements of this part of the matrix are equal to zero. Characteristic factors can obviously be divided into specific and error-related.

The column of the factor matrix characterizes the factor and its influence on all variables. The line characterizes the variable and its content with various factors, in other words, the factor structure of the variable.

When analyzing only the first part of the matrix, we are dealing with a factor matrix showing the total variance of each variable. This part of the matrix is ​​called reduced and is denoted F. This matrix does not take into account the loading of characteristic factors and does not take into account specific variance. Recall that in accordance with what was said above about common variances and factor loadings, which are the square roots of common variances, the sum of the squares of the elements of each row of the reduced factor matrix F is equal to the communality of this variable

Accordingly, the sum of the squares of all row elements of the complete factor matrix is ​​equal to , or the total variance of a given variable.

Since factor analysis focuses on common factors, in what follows we will mainly use the reduced correlation and reduced factor matrix.


If factor analysis is done properly, rather than being satisfied with the default settings (“little jiffies,” as methodologists derisively call the standard gentleman's set), the preferred method of factor extraction is either maximum likelihood or generalized least squares. This is where trouble can await us: the procedure produces an error message: correlation matrix is ​​not positive definite. What does this mean, why does it happen and how to deal with the problem?
The fact is that during the factorization process, the procedure searches for the so-called inverse matrix with respect to the correlation matrix. There is an analogy here with the usual real numbers: by multiplying a number by its inverse, we should get one (for example, 4 and 0.25). However, for some numbers there are no inverses - zero cannot be multiplied by something that will result in one. It's the same story with matrices. A matrix multiplied by its inverse gives the identity matrix (the ones are on the diagonal and all other values ​​are zero). However, for some matrices there are no inverses, which means that factor analysis becomes impossible for such cases. This fact can be clarified using a special number called a determinant. If it tends to zero or is negative for the matrix, then we are faced with a problem.
What are the reasons for this situation? Most often it arises due to the existence of a linear relationship between variables. It sounds strange, since it is precisely such dependencies that we are looking for using multidimensional methods. However, in the case when such dependencies cease to be probabilistic and become strictly deterministic, multidimensional analysis algorithms fail. Consider the following example. Let us have the following data set:
data list free / V1 to V3. begin data. 1 2 3 2 1 2 3 5 4 4 4 5 5 3 1 end data. compute V4 = V1 + V2 + V3.
The last variable is the exact sum of the first three. When does this situation arise in a real study? When we include raw scores for subtests and the test as a whole in the set of variables; when the number of variables is much larger than the number of subjects (especially if the variables are highly correlated or have a limited set of values). In this case, precise linear relationships may arise by chance. Dependencies are often an artifact of the measurement procedure - for example, if percentages within observations are calculated (say, the percentage of statements of a certain type), a ranking method or distribution of a constant sum is used, some restrictions are introduced on the choice of alternatives, etc. As you can see, these are quite common situations.
If, when conducting factor analysis in SPSS of the above array, you order the output of the determinant and the inverse correlation matrix, the package will report a problem.
How to identify a group of variables that create multicollinearity? It turns out that the good old method of principal components, despite the linear dependence, continues to work and produces something. If you see that the communalities of some of the variables are approaching 0.90-0.99, and the eigenvalues ​​of some factors become very small (or even negative), this is not a good sign. In addition, order a varimax rotation and see which group of variables ended up with the friend suspected of having a criminal connection. Usually its load on this factor is unusually large (0.99, for example). If this set of variables is small, heterogeneous in content, the possibility of artifactual linear dependence is excluded, and the sample is large enough, then the discovery of such a relationship can be considered an equally valuable result. You can rotate such a group in regression analysis: make the variable that showed the highest load dependent, and try all the others as predictors. R, i.e. the multiple correlation coefficient should in this case be equal to 1. If linear connection is very neglected, then the regression will silently throw out some other predictors, look carefully at what is missing. By additionally ordering a multicollinearity diagnostic output, you can eventually find the ill-fated set that forms an exact linear relationship.
And finally, there are several other smaller reasons why the correlation matrix is ​​not positive definite. This is, firstly, the presence of a large number of non-responses. Sometimes, in order to make the most of the available information, the researcher orders the processing of gaps in pairs. As a result, the result may be such an “illogical” connection matrix that the factor analysis model will not be able to handle it. Second, if you choose to factorize a correlation matrix reported in the literature, you may encounter the negative impact of rounding numbers.

Factor analysis of variance

Factor matrix

Variable Factor A Factor B

As can be seen from the matrix, factor loadings (or weights) A and B for different consumer requirements differ significantly. Factor loading A for requirement T 1 corresponds to the closeness of the connection, characterized by a correlation coefficient equal to 0.83, i.e. good (close) dependence. Factor loading B for the same requirement gives r k= 0.3, which corresponds to a weak connection. As expected, factor B correlates very well with consumer requirements T 2, T 4 and T 6.

Considering that factor loadings of both A and B influence consumer requirements not related to their group with a close connection of no more than 0.4 (i.e. weakly), we can assume that the intercorrelation matrix presented above is determined by two independent factors, which in turn, six consumer requirements are determined (with the exception of T 7).

The T 7 variable could be isolated as an independent factor, since it does not have a significant correlation load (more than 0.4) with any consumer requirement. But, in our opinion, this should not be done, since the factor “the door should not rust” is not directly related to consumer requirements for designs doors.

Thus, when asserting terms of reference to design the structure of car doors, it is the names of the obtained factors that will be entered as consumer requirements for which it is necessary to find a constructive solution in the form of engineering characteristics.

Let us point out one fundamentally important property of the correlation coefficient between variables: squared, it shows what part of the variance (scatter) of the attribute is common to two variables, and how much these variables overlap. So, for example, if two variables T 1 and T 3 with a correlation of 0.8 overlap with a degree of 0.64 (0.8 2), then this means that 64% of the variances of both variables are common, i.e. match. It can also be said that community of these variables is equal to 64%.

Let us recall that the factor loadings in the factor matrix are also correlation coefficients, but between factors and variables (consumer requirements).

Variable Factor A Factor B

Therefore, the squared factor loading (variance) characterizes the degree of commonality (or overlap) of a given variable and a given factor. Let's determine the degree of overlap (variance D) of both factors with the variable (consumer requirement) T 1. To do this, it is necessary to calculate the sum of the squares of the weights of the factors with the first variable, i.e. 0.83 x 0.83 + 0.3 x 0.3 = 0.70. Thus, the commonality of the T 1 variable with both factors is 70%. This is quite a significant overlap.


At the same time, low communality may indicate that the variable measures or reflects something that is qualitatively different from the other variables included in the analysis. This implies that this variable does not combine with the factors for one of the reasons: either it measures another concept (such as the T 7 variable), or has a large measurement error, or there are features that distort the variance.

It should be noted that the significance of each factor is also determined by the amount of dispersion between the variables and the factor loading (weight). In order to calculate the eigenvalue of a factor, you need to find in each column of the factor matrix the sum of the squares of the factor loading for each variable. Thus, for example, the variance of factor A (D A) will be 2.42 (0.83 x 0.83 + 0.3 x 0.3 + 0.83 x 0.83 + 0.4 x 0.4 + 0 .8 x 0.8 + 0.35 x 0.35). Calculation of the significance of factor B showed that D B = 2.64, i.e. the importance of factor B is higher than factor A.

If the eigenvalue of a factor is divided by the number of variables (in our example there are seven), then the resulting value will show what proportion of the variance (or amount of information) γ in the original correlation matrix this factor will make up. For factor A γ ~ 0.34 (34%), and for factor B - γ = 0.38 (38%). Summing up the results, we get 72%. Thus, the two factors, when combined, fill only 72% of the variance in the original matrix indicators. This means that as a result of factorization, some of the information in the original matrix was sacrificed to construct a two-factor model. As a result, 28% of information was missing that could have been recovered if the six-factor model had been adopted.

Where did the mistake go, given that all the considered variables relevant to the door design requirements have been taken into account? It is most likely that the values ​​of the correlation coefficients of variables related to one factor are somewhat underestimated. Taking into account the analysis carried out, it would be possible to return to the formation of other values ​​of correlation coefficients in the intercorrelation matrix (see Table 2.2).

In practice, we often encounter a situation in which the number of independent factors is large enough to take them all into account in solving a problem either from a technical or economic point of view. There are a number of ways to limit the number of factors. The most famous of them is Pareto analysis. In this case, those factors are selected (as their significance decreases) that fall within the 80-85% limit of their total significance.

Factor analysis can be used to implement the quality function structuring (QFD) method, which is widely used abroad when creating technical specifications for a new product.

National Research Nuclear University "MEPhI"
Faculty of Business Informatics and Management
complex systems
Department of Economics and Management
in industry (No. 71)
Mathematical and instrumental processing methods
statistical information
Kireev V.S.,
Ph.D., Associate Professor
Email:
Moscow, 2017
1

Normalization

Decimal scaling
Minimax normalization
Normalization using standard transform
Normalization using element-wise transformations
2

Decimal scaling

Vi
"
Vi k , max (Vi) 1
10
"
3

Minimax normalization

Vi
Vi min (Vi)
"
i
max (Vi) min (Vi)
i
i
4

Normalization using standard deviation

Vi
"
V
V
Vi V
V
- selective
average
- sample mean square
deviation
5

Normalization using element-wise transformations

Vi f Vi
"
Vi 1
"
log Vi
, Vi log Vi
"
Vi exp Vi
"
Vi Vi , Vi 1 y
Vi
"
y
"
6

Factor analysis

(FA) is a set of methods that
basis of real-life connections of the analyzed features, connections themselves
observed objects, allow you to identify hidden (implicit, latent)
generalizing characteristics of the organizational structure and development mechanism
phenomena and processes being studied.
Factor analysis methods in research practice are mainly used
way for the purpose of compressing information, obtaining a small number of generalizing
characteristics that explain the variability (dispersion) of elementary characteristics (R-technique of factor analysis) or the variability of observed objects (Q-technique
factor analysis).
Factor analysis algorithms are based on the use of reduced
matrices of pairwise correlations (covariances). A reduced matrix is ​​a matrix
the main diagonal of which there are not units (estimates) of complete correlation or
estimates of the total dispersion, and their reduced, somewhat reduced values. At
This postulates that the analysis will not explain all the variance.
of the characteristics (objects) being studied, and some of it, usually a large one. Remaining
the unexplained portion of the variance is the characteristic that arises due to specificity
observed objects, or errors made when recording phenomena, processes,
those. unreliability of input data.
7

Classification of FA methods

8

Principal component method

(MGK) is used to reduce dimensionality
space of observed vectors, without leading to a significant loss
information content. The premise of PCA is the normal distribution law
multidimensional vectors. In PCA, linear combinations of random variables are defined
characteristic
vectors
covariance
matrices.
Main
components represent an orthogonal coordinate system in which the variances
components are characterized by their statistical properties. MGC is not classified as FA, although it has
a similar algorithm and solves similar analytical problems. Its main difference
lies in the fact that it is not the reduced, but the ordinary matrix that is subject to processing
pairwise correlations, covariances, on the main diagonal of which units are located.
Let an initial set of vectors X of the linear space Lk be given. Application
method of principal components allows us to go to the basis of the space Lm (m≤k), such
that: the first component (the first basis vector) corresponds to the direction along
which the dispersion of the vectors of the original set is maximum. Direction two
components (of the second basis vector) is chosen in such a way that the dispersion of the initial
vectors along it was maximum under the condition of orthogonality to the first vector
basis. The remaining basis vectors are determined similarly. As a result, directions
basis vectors are chosen to maximize the variance of the original set
along the first components, called principal components (or principal
axes). It turns out that the main variability of the vectors of the original set of vectors
represented by the first few components, and the opportunity arises, discarding
less essential components, move to a space of lower dimension.
9

10. Principal component method. Scheme

10

11. Principal component method. Account Matrix

The counting matrix T gives us projections of the original samples (J-dimensional
vectors
x1,…,xI)
on
subspace
main
component
(A-dimensional).
Rows t1,…,tI of matrix T are the coordinates of the samples in new system coordinates
Columns t1,…,tA of matrix T are orthogonal and represent projections of all samples onto
one new coordinate axis.
When studying data using the PCA method, special attention is paid to graphs
accounts. They carry information useful for understanding how
data. On the counting graph, each sample is depicted in coordinates (ti, tj), most often
– (t1, t2), denoted PC1 and PC2. The proximity of two points means their similarity, i.e.
positive correlation. Points located at right angles are
uncorrelated, and those located diametrically opposite have
negative correlation.
11

12. Principal component method. Load matrix

The load matrix P is the transition matrix from the original space
variables x1, …xJ (J-dimensional) into the space of principal components (A-dimensional). Each
a row of the matrix P consists of coefficients connecting the variables t and x.
For example, a-th line is the projection of all variables x1, ...xJ onto the a-th axis of the main
component. Each column P is the projection of the corresponding variable xj onto a new
coordinate system.
Loadings plot is used to examine the role of variables. On this
In the graph, each variable xj is represented by a point in coordinates (pi, pj), for example
(p1, p2). Analyzing it similarly to the chart of accounts, you can understand which variables
are related and which are independent. Joint study of paired graphs of accounts and
loads, can also give a lot useful information about data.
12

13. Features of the principal component method

The principal component method is based on the following assumptions:
assumption that data dimensionality can be effectively reduced
by linear transformation;
the assumption that the most information is carried by those directions in which
the dispersion of the input data is maximum.
It can be easily seen that these conditions are not always met. For example,
if the points of the input set are located on the surface of the hypersphere, then no
linear transformation will not be able to reduce dimensionality (but it can easily cope with this
nonlinear transformation based on the distance from a point to the center of the sphere).
This drawback is equally characteristic of all linear algorithms and can be
overcome by using additional dummy variables, which are
nonlinear functions from elements of the input data set (the so-called kernel trick).
The second disadvantage of the principal component method is that the directions
Those that maximize dispersion do not always maximize information content.
For example, a variable with maximum variance may carry almost no
information, while a variable with minimum variance allows
completely separate classes. The principal component method in this case will give
preference for the first (less informative) variable. All additional
information associated with the vector (for example, whether the image belongs to one of
classes) is ignored.
13

14. Example data for MGC

K. Esbensen. Multivariate data analysis, abbr. lane from English under
ed. O. Rodionova, From the Institute of Chemical Physics RAS, 2005
14

15. Example of data for MGC. Designations

Height
Height: in centimeters
Weight
Weight: in kilograms
Hair
Hair: short: –1, or long:
+1
Shoes
Shoes: European size
standard
Age
Age: in years
Income
Income: in thousands of euros per year
Beer
Beer: consumption in liters per year
Wine
Wine: consumption in liters per year
Sex
Gender: male: –1, or female: +1
Strength
Strength: index based on
physical ability testing
Region
Region: north: –1, or south: +1
IQ
IQ,
measured by standardized test
15

16. Account Matrix

16

17. Load matrix

17

18. Sampling objects in the space of new components

Women (F) are indicated by circles ● and ●, and
men (M) – squares ■ and ■. North (N)
represented by blue ■ and south (S) by red
color ●.
The size and color of the symbols reflect income - what
the bigger and lighter it is, the bigger it is. Numbers
represent age
18

19. Initial variables in the space of new components

19

20. Scree plot

20

21. Method of main factors

In the paradigm of the principal factor method, the task of reducing the dimension of the feature
space looks like n features can be explained using smaller
number of m-latent features - common factors, where m<initial characteristics and introduced common factors (linear combinations)
taken into account using so-called characteristic factors.
The ultimate goal of a statistical study conducted with the participation of
factor analysis apparatus, as a rule, consists of identifying and interpreting
latent common factors with a simultaneous desire to minimize both of them
number and degree of dependence on their specific residual random
component.
Every sign
is the result
impacts of m hypothetical general and
one characteristic factor:
X 1 a11 f1 a12 f 2 a1m f m d1V1
X a f a f a f d V
2
21 1
22 2
2m m
2
X n a n1 f1 a n 2 f 2 a nm f m d nVn
21

22. Rotation of factors

Rotation is a way of turning the factors obtained in the previous step into
into more meaningful ones. Rotation is divided into:
graphic (draw axes, not used for more than two-dimensional
analysis),
analytical (a certain rotation criterion is selected, orthogonal and
oblique) and
matrix-approximate (rotation consists of approaching a certain given
target matrix).
The result of the rotation is the secondary factor structure. Primary
factor structure (consisting of primary loadings (obtained on the previous
stage) are, in fact, projections of points onto orthogonal coordinate axes. It's obvious that
if the projections are zero, then the structure will be simpler. And the projections will be zero,
if the point lies on some axis. Thus, rotation can be considered a transition from
one coordinate system to another with known coordinates in one system(
primary factors) and iteratively selected coordinates in another system
(secondary factors). When obtaining a secondary structure, they tend to move to such
coordinate system in order to draw as many axes as possible through points (objects) so that
as many projections (and therefore loads) were zero as possible. At the same time they can
the restrictions of orthogonality and decreasing significance from first to last are removed
factors characteristic of the primary structure.
22

23. Orthogonal rotation

implies that we will rotate the factors, but not
we will violate their orthogonality to each other. Orthogonal rotation
implies multiplying the original primary load matrix by an orthogonal one
matrix R (a matrix such that
V=BR
The orthogonal rotation algorithm in the general case is as follows:
0. B - matrix of primary factors.
1.
Looking for
orthogonal
matrix
RT
size
2*2
For
two
columns (factors) bi and bj of matrix B such that the criterion for the matrix
R is maximum.
2.
Replace columns bi and bj with columns
3.
We check whether all columns have been sorted. If not, then go to 1.
4.
We check that the criterion for the entire matrix has increased. If yes, then go to 1. If
no, then the end of the algorithm.
.
23

24. Varimax rotation

This criterion uses the formalization
dispersion of squared loadings of a variable:
complexity
factor
through
Then the criterion in general can be written as:
At the same time, factor loadings can be normalized to get rid of
influence of individual variables.
24

25. Quartimax rotation

Let us formalize the concept of factor complexity q of the i-th variable through
dispersion of squared factor loadings of factors:
where r is the number of columns of the factor matrix, bij is the factor loading of the jth
factor on the i-th variable, is the average value. The quartimax criterion tries
maximize the complexity of the entire set of variables in order to achieve
ease of interpretation of factors (aims to make column descriptions easier):
Considering that
- constant (sum of eigenvalues ​​of the matrix
covariance) and expanding the mean (and also taking into account that the power function
grows proportionally to the argument), we obtain the final form of the criterion for
maximization:
25

26. Criteria for determining the number of factors

The main problem of factor analysis is the identification and interpretation
main factors. When selecting components, the researcher usually faces
significant difficulties, since there is no unambiguous criterion for identifying
factors, and therefore subjectivity in the interpretation of results is inevitable.
There are several commonly used criteria for determining the number of factors.
Some of them are alternative to others, and some of them
criteria can be used together so that one complements the other:
Kaiser criterion or eigenvalue criterion. This criterion is proposed
Kaiser, and is probably the most widely used. Only selected
factors with eigenvalues ​​equal to or greater than 1. This means that if
factor does not allocate variance equivalent to at least the variance of one
variable, then it is omitted.
Scree criterion or screening criterion. He is
graphic method, first proposed by the psychologist Cattell. Own
the values ​​can be depicted in the form of a simple graph. Cattell suggested finding such
the place on the graph where the decrease in eigenvalues ​​from left to right is maximum
slows down. It is assumed that to the right of this point there is only
"factorial scree" - "slide" is a geological term meaning
rock fragments accumulating at the bottom of a rocky slope.
26

27. Criteria for determining the number of factors. Continuation

Significance criterion. It is especially effective when the general model
the totality is known and there are no secondary factors. But the criterion is unsuitable
to search for changes in the model and are implemented only in factor analysis using the method
least squares or maximum likelihood.
Criterion for the proportion of reproducible variance. Factors are ranked by share
deterministic variance, when the percentage of variance turns out to be insignificant,
the release should be stopped. It is desirable that the identified factors explain
more than 80% of the spread. Disadvantages of the criterion: firstly, the selection is subjective; secondly, the specificity of the data may be such that all the main factors cannot
collectively explain the desired percentage of spread. Therefore the main factors
must together explain at least 50.1% of the variance.
Criterion of interpretability and invariance. This criterion combines
statistical precision with subjective interests. According to him, the main factors
can be isolated as long as their clear interpretation is possible. She, in her
turn, depends on the magnitude of the factor loadings, that is, if the factor contains at least
one strong load, it can be interpreted. The reverse option is also possible -
if there are strong loads, but interpretation is difficult, this
It is preferable to discard the components.
27

28. Example of using MGC

Let
available
following
indicators
economic
activities
enterprises: labor intensity (x1), share of purchased items in production (x2),
equipment shift ratio (x3), proportion of workers in the enterprise
(x4), bonuses and rewards per employee (x5), profitability (y). Linear
the regression model looks like:
y = b0 + b1*x1 + b2*x2 + b3*x3 + b4*x4 + b5*x5
x1
x2
x3
x4
x5
y
0,51
0,2
1,47
0,72
0,67
9,8
0,36
0,64
1,27
0,7
0,98
13,2
0,23
0,42
1,51
0,66
1,16
17,3
0,26
0,27
1,46
0,69
0,54
7,1
0,27
0,37
1,27
0,71
1,23
11,5
0,29
0,38
1,43
0,73
0,78
12,1
0,01
0,35
1,5
0,65
1,16
15,2
0,02
0,42
1,35
0,82
2,44
31,3
0,18
0,32
1,41
0,8
1,06
11,6
0,25
0,33
1,47
0,83
2,13
30,1
28

29. Example of using MGC

Building a regression model in a statistical package shows
coefficient X4 is not significant (p-Value > α = 5%) and can be excluded from the model.
What
After eliminating X4, the model building process starts again.
29

30. Example of using MGC

The Kaiser criterion for PCA shows that it is possible to leave 2 components that explain
about 80% of the original variance.
For the selected components, you can construct equations in the original coordinate system:
U1 = 0.41*x1 - 0.57*x2 + 0.49*x3 - 0.52*x5
U2 = 0.61*x1 + 0.38*x2 - 0.53*x3 - 0.44*x5
30

31. Example of using MGC

Now you can build a new regression model in the new components:
y = 15.92 - 3.74*U1 - 3.87*U2
31

32. Singular value decomposition (SVD) method

Beltrami and Jordan are considered the founders of singularity theory
decomposition. Beltrami - for being the first to publish a work on
singular decomposition, and Jordan for the elegance and completeness of its
work. Beltrami's work appeared in the Journal of Mathematics for
the Use of the Students of the Italian Universities” in 1873, main
the purpose of which was to familiarize students with
bilinear forms. The essence of the method is the decomposition of a matrix A of size n
x m with rank d = rank (M)<= min(n,m) в произведение матриц меньшего
rank:
A =UDVT,
where matrices U of size n x d and V of size m x d consist of
orthonormal columns, which are eigenvectors for
non-zero eigenvalues ​​of matrices AAT and ATA, respectively, and
UTU = V TV = I, and D of size d x d is a diagonal matrix with
positive diagonal elements sorted into
in descending order. The columns of matrix U represent,
is an orthonormal basis of the column space of matrix A, and the columns
matrix V is an orthonormal basis of the row space of matrix A.
32

33. Singular value decomposition (SVD) method

An important property of the SVD decomposition is the fact that if
for k only from the k largest diagonal elements, and also
leave only the first k columns in the matrices U and V, then the matrix
Ak=UkDkVkT
will be the best approximation of matrix A with respect to
Frobenius norm among all matrices with rank k.
This truncation firstly reduces the dimension of the vector
space, reduces storage and computing requirements
model requirements.
Secondly, by discarding small singular values, small
distortions resulting from noise in the data are removed, leaving
only the strongest effects and trends in this model.

In general, to explain the correlation matrix, not one, but several factors will be required. Each factor is characterized by a column , each variable is a row of the matrix. The factor is called general if all its loads are significantly different from zero and it has loads from all variables. The general factor has loadings from all variables and such a factor is shown schematically in Fig. 1. column .Factor is called general, if at least two of its loads differ significantly from zero. Columns, on rice. 1. represent such common factors. They have loadings on more than two variables. If a factor has only one loading significantly different from zero, then it is called characteristic factor(see columns on rice. 1.) Each such factor represents only one variable. Common factors are of decisive importance in factor analysis. If the general factors are established, then the characteristic factors are obtained automatically. The number of high loadings of a variable on common factors is called complexity. For example, a variable on Fig.1. has a difficulty of 2, and the variable has a difficulty of three.

Rice. 1. Schematic representation of factor mapping. A cross indicates a high factor loading.

So, let's build a model

, (4)

where are unobservable factors m< k,

Observed variables (initial characteristics),

Factor loadings,

Random error associated only with zero mean and variance:

And - uncorrelated,

Uncorrelated random variables with zero mean and unit variance .

(5)

Here - i The th community, which represents the part of the variance due to factors, is the part of the variance due to the error. In matrix notation, the factor model will take the form:

(6)

where is the loading matrix, is the vector of factors, is the vector of errors.

Correlations between variables, expressed by factors, can be derived as follows:

Where - diagonal matrix of order containing error variances[i]. Main condition: - diagonal, - non-negative definite matrix. An additional condition for the uniqueness of the solution is the diagonality of the matrix.

There are many methods for solving a factor equation. The earliest method of factor analysis is principal factor method, in which the principal component analysis technique is applied to a reduced correlation matrix with commonalities on the main diagonal. To assess commonalities, they usually use the multiple correlation coefficient between the corresponding variable and the set of other variables.

Factor analysis is carried out on the basis of a characteristic equation, as in principal component analysis:

(8)

By solving which, they obtain the eigenvalues ​​λ i and the matrix of normalized (characteristic) vectors V, and then find the factor mapping matrix:

An empirical iterative algorithm is used to obtain community estimates and factor loadings that converge to true parameter estimates. The essence of the algorithm boils down to the following: initial estimates of factor loadings are determined using the principal factor method. Based on the correlation matrix R, estimates of the principal components and common factors are formally determined:

(9)

where is the corresponding eigenvalue of matrix R;

Source data (column vector);

Coefficients for common factors;

Principal components (column vectors).

The estimates of factor loadings are the values

The generality estimates are obtained as

At the next iteration, the matrix R is modified - instead of the elements of the main diagonal, the community estimates obtained at the previous iteration are substituted; Based on the modified matrix R, using the computational scheme of component analysis, the calculation of the principal components (which are not such from the point of view of component analysis) is repeated; estimates of the main factors, factor loadings, commonalities, and specificities are sought. Factor analysis can be considered complete when community estimates change little at two adjacent iterations.

Note. Transformations of the matrix R may violate the positive definiteness of the matrix R+ and, as a consequence, some eigenvalues ​​of R+ may be negative.

Tell friends