Factor analysis. Principal component method. Principal component method Criteria for selection of principal components

The head component method is a method that translates a large number of related (fallow, rooted) variables in a smaller number of independent variables, and a large number of variables often complicates analysis and interpretation of information. Strictly kazhuchi, tsey method is not considered before factorial analysis, although it may be rich with it. Specifically, first of all, those who, during the course of counting procedures, immediately take away all the smut components and their number of more than half the cost of the last changes; in a different way, it is postulated the possibility of a new distribution of the dispersion of all external changes, that is. її outward explanation through latent factori (highlighted signs).

For example, apparently, we conducted research, in which the students' intellect was measured by Wechsler's test, Eysenck's test, Raven's test, as well as success from social, cognitive and global psychology. As much as possible, that the indicators of various tests for intelligence correlate with each other, so that the stinks vimiryuyut one characteristic of the lower - yogo intellectual wellness, even if it is different. Yakscho zminnyh at doslіzhenny too rich ( x 1 , x 2 , …, x p ) , Deyakі їх vzaєmopov'yazanі, then in the last one vinikaє bazhannya change the folding of data, shortening the number of changes. For which and serve the method of head components, which creates a sprat of new changes y 1 , y 2 , …, y p, skin with some linear combination of cob changes x 1 , x 2 , …, x p :

y 1 =a 11 x 1 +a 12 x 2 +…+a 1p x p

y 2 \u003d a 21 x 1 + a 22 x 2 + ... + a 2p x p

(1)

y p =a p1 x 1 +a p2 x 2 +…+a pp x p

Changes y 1 , y 2 , …, y p are called head components by chinniks. In this way, the factor is a piece of statistical evidence, which is the cause of special transformations of the correlation matrix. . The procedure for factorization is called matrix factorization. As a result of factorization from the correlation matrix, the number of factors can be varied even up to the number that is equal to the number of outgoing changes. However, the factors that appear in the result of factorization, as a rule, are not equal to their values.

Coefficient a ij, which signify a new change, are chosen in such a way that new changes (head components, factors) describe the maximum amount of variability of data and do not quarrel with each other. Often clearly show the coefficients a ij in such a way that the stench was a correlation coefficient between the external change and the new change (factor). Tse reach multiples a ij standard deviation of the factor. For most statistical packages, this is how it works (for the STATISTICA program, too). Coefficienta ij Sound the stinks are served at the look of the tables, defactory roztashovutsya at the look of the columns, and change at the look of the rows:

Such a table is called a table (matrix) of factorial preferences. Numbers pointed at nіy, є by coefficients a ij. The number 0.86 means that the correlation between the first factor and the value of the Wechsler test is 0.86. The higher the factor is not favored in absolute terms, the stronger the link between the change and the factor.

Principal Component Analysis (PCA) simplifies the folding of high-dimensional data, preserving trends and patterns. Vіn rob tse, converting data into smaller ones, like a summary of functions. Such data are even wider in different fields of science and technology, and they blame it, if for the skin zrazka a sign is used, for example, such as an expression of rich views. This type of tribute presents problems, caused by the frequency of pardons through multiple corrections of tribute.

The method is similar to clustering - to know the patterns without sending and analyzing them, reversing, chi zrazki from different groups of research, and stench may istotnі vіdmіnnostі. Like all statistical methods, yoga can be misdiagnosed. The scaling of the changes can be brought to different results in the analysis, and it is important, so that it was not corrected, according to the previous value of the data.

Purpose of component analysis

The main meta method is to reveal that change in the data set, to identify new significant basic changes. For this purpose, it is necessary to use special tools, for example, to select rich data in the TableOfReal data matrix, in the same rows to match the changes and changes. Therefore, TableOfReal is interpreted as a vector and data numberOfRows, skin vector of such a number of elements Columns.

Traditionally, the head component method is based on a covariance matrix or a correlation matrix, which can be calculated from the data matrix. The covariance matrix can be used to scale the sum of squares and cross-creations. The correlation matrix is ​​similar to the covariance matrix, but in it the column is changed, so the columns are standardized. Sometimes it happens to standardize the data, because the variances of some of them vary greatly. In order to analyze the data, select the data matrix TabelOfReal from the list of objects and press to go.

Tse prizvede before the appearance of a new object in the list of objects for the method of principal components. Now you can add a graph of curved values, so that you can take into account the importance of the skin. And the program can also suggest a diyu: take away a part of the variance, or reverse the equality of the number of power values ​​and take away the equality. Oskіlki komponenti otrimani way vyrіshennya specific tasksі optimіzatsії, stench may deyakі "vbudovanі" power, for example, the maximum minlivost. In addition, there is a low level of other powers, which can provide factorial analysis:

  • the dispersion of the skin, in its part of the total dispersion of the external changes, is set by the power values;
  • calculation of the assessment, which illustrates the significance of the skin component for the hour of caution;
  • otrimannya navantage, how to describe the correlation between the skin component and the skin change;
  • correlation between external changes, created for an additional p-component;
  • in the work of the weekend data can be done in the form of p-components;
  • "rotation" of the components, in order to advance their interpretation.

Choose the number of saving points

There are two ways to choose the required number of components for saving. Offending methods are grounded on vіdnosinah mizh vlasnymi meanings. For whom it is recommended to use a schedule value. As the points on the graph may tend to virіvnyuvatisya and close to zero, they can be ignored. Intermediate the number of components to the number, as if falling into a single part of the global dispersion. For example, in order to be satisfied with 95% of total dispersion - the number of components (VAF) is 0.95.

Principal components are used to design a rich statistical analysis of the method of principal components in datavectors in the vastness of public vectors. You can create it in two ways - directly from TableOfReal without frontally forming the PCA object and then you can display the configuration or numbers. Select the object and TableOfReal at the same time and "Configuration", in this way, the analysis of the wet polished components is victorious.

As such, the starting point is shown by a symmetric matrix, for example, a covariance matrix, which is reduced to a form, then the QL algorithm with implicit failures. Since the point is the right point and the matrix of data, then it is impossible to form a matrix from the sums of squares. Natomist, move into a numerically more stable way, and settle the arrangement according to singular values. The same matrix is ​​a good vector, and the square diagonal elements are good values.

The main component є was used to normalize the linear combination of outward predictors in the dataset using the head component method for dummies. In the image, PC1 and PC2 are the main components. Admissible, є low predictor, yak X1, X2 ..., XP.

The main component can be written as: Z1 = 11X1 + 21X2 + 31X3 + .... + p1Xp

  • Z1 is the first head component;
  • p1 - ​​the vanity vector that adds up to the vanity (1, 2.) of the first principal component.

The profitability is exchanged with the sum of the square 1. It is connected with this, that a large value of the profitability can lead to a large dispersion. Vіn also directly indicates the main component (Z1), for which given the most difference. Tse to bring to the fact that the line in the expanse of r-measures, closer to n-guard.

Proximity vymіryuєtsya z vikoristannyam mean-square Euclidean wave. X1..Xp are normalized predictors. Normalized predictors may have a mean value that is equal to zero, and a standard deviation is equal to one. Also, the first head component is a whole combination of external speaker changes, which fixes the maximum variance in the set of data. Vіn vyznaє directly the greatest sluggishness of the data. The more minuscule is fixed in the first component, the more information is taken away by him. Zhoden іnshiy can not mother minlivіst more than the first basic.

Bring the first main component to the row, which is closest to the data and bring to the minimum sum of the square between the data point and the line. The other head component (Z2) is also a linear combination of outward predictors, as it fixes the variance, which is missing, in the data set and Z1 is uncorrelated. In other words, the correlation between the first and other components can reach zero. Vіn can be represented as: Z2 = 12X1 + 22X2 + 32X3 + .... + p2Xp.

As if they were uncorrelated, they could be directly orthogonal.

In addition, as the calculation of the main components starts the process of predicting test data for all selections. The principal component method process for teapots is simple.

For example, it is necessary to work on the conversion to the test set, including the function of the center and scaling in movie R (ver.3.4.2) and yoga library rvest. R - free language programming for statistical calculations and graphics. Vіn buv reconstructions of 1992 to the rock for the accomplishment of statistical tasks by the koristuvachs. The whole process of modeling after PCA.

To implement PCA in python, import data from the sklearn library. The interpretation remains the same as R. Only a few of the data that are featured for Python are a cleared version, in which the values ​​are put in the same day, and the categorical changes are converted into numbers. The process of modeling is left the same, as described in the example for the cortex R.

The idea of ​​the method of the main component is useful for close virase for the development of factorial analysis. Instead of summing up from 1 to p, now summing up from 1 to m, ignoring the rest of the p-m terms in the sum, taking away the third viraz. It is possible to rewrite tse, as it is shown in the verse, which is chosen for the designation of the factorial preference matrix L, which gives a residual expression of the matrix notation. As a rule, the standardized vimirovanie is vindicated, replaced by the matrix of the correlation selection R.

Tse form the matrix L factor-prevailing in factorial analysis that is accompanied by a transposed L. To estimate specific variances, the factor model for the variance-covariance matrix.

Now we have a better matrix variance-covariance minus LL".

  • Xi is a vector of guards for the i-th subject.
  • S stands for our vibratory variance-covariance matrix.

Same p power values ​​for the qi matrix of the covariance variance, as well as the same power vectors for the qi matrix.

Valid values ​​S:λ^1, λ^2, ..., λ^p.

Power vectors S: e^1, e^2, ..., e^n.

The PCA analysis is the most difficult and popular method of multivariate analysis, which allows to add rich datasets from a large number of changes. Behind this method, the method of head components is widely used in bioinformatics, marketing, sociology and richness of others. XLSTAT provides a full and flexible function for displaying data without intermediary in Excel and propagates a few standard and extended options, to allow you to take a deep look at the report of data in Excel.

You can run the program on uncompleted data matrices of information, add additional changes to the guard, filter the changes according to different criteria for optimizing the reading of cards. Moreover, you can turn around. It is easy to create a correlative column, a graph is a guardian like standard Excel charts. It is enough to transfer data about the results, so that they win the analysis.

XLSTAT introduces a number of data processing methods that will be used on input data before calculating the main component:

  1. Pearson, the classic PCA, which automatically standardizes data for calculation, in order to eliminate the overblown influx of changes from great inspirations in the result.
  2. Covariance that works with non-standard departures.
  3. Polygorical, for ordinal data.

Apply an analysis of the given data

You can look at the method of principal components with the use of a symmetric correlation covariance matrix. Tse means that the matrix can be numeric and the mother of standardized data. It is permissible, є data dialing is 300 (n) × 50 (p). Where n is the number of warnings, and p is the number of predictors.

Oskіlki є great p = 50, p(p-1)/2 is possible. In this case, it would be a common approach to choose the submultiplier of the predictor p (p<< 50), который фиксирует количество информации. Затем следует составление графика наблюдения в полученном низкоразмерном пространстве. Не следует забывать, что каждое измерение является линейной комбинацией р-функций.

Butt for the matrix from two changes. In this application of the method of head components, a set of data is created from two changes (large and diagonal dozhina) from the selection of piece data of Devis.

Components can be painted on the diagram of the distribution in this way.

This graph illustrates the idea of ​​the first or the main component, which ensures the optimal data link - another line is drawn on such a graph, it does not create a set of predicted values ​​of data points on the line with less dispersion.

The first component can also be added to the regression with a changed head weight (RMA), in which it is transferred, like x-, so and y-change may have a pardon or non-insignificance, or there is no clear difference between the head and the wind.

The method of head components in econometrics is the analysis of changes, such as GNP, inflation, exchange rates, etc. We then evaluate them for obvious tributes, head rank, and total time series. However, econometric models can be tweaked for rich programs, but not for macroeconomic ones. Thus, econometrics means economic world.

The development of statistical methods up to the best econometrics of data shows the interrelationship between economic changes. A simple example of an econometric model. It is expected that the majority of the people will recover in a linear way depending on the income of the survivors in the previous month. The same model is foldable

The task of econometrics is to evaluate the estimates of parameters a and b. The number of estimated parameters, as they are victorious in the equal model, allow predicting the future value of life, as it lies in the income of the previous month. Under the hour for the development of these types of models, it is necessary to insure a few moments:

  • the nature of the moving process that generates data;
  • rіven know about tse;
  • expansion of the system;
  • form of analysis;
  • obriy forecast;
  • mathematical folding of the system.

All reasons are important, shards in them lay dzherela pardons, like models. In addition, for the solution of these problems, it is necessary to design a forecasting method. It can be brought to a linear model, but it's still a small selection. This type is one of the most important, for which you can create a predictive analysis.

Non-parametric statistics

The method of head components for non-parametric data should be prior to the methods of the world, for which the data are ranked from the bottom line. Non-parametric statistical methods are widely used in different types of studies. In practice, if the assumption about normality is not overcome, parametric statistical methods can lead to results that can be introduced into Oman. Navpaki, non-parametric methods to shy away less suvori allowance for rozpodіl for wimirami.

The stench is reliable, regardless of the rozpodіlіv guards that lie in their foundations. Through this research, for the analysis of different types of experimental designs, a lot of different types of nonparametric tests were divided. Such projects include design from one selection, design from two stripes, design from random blocks. Ninі non-parametric bayesivsky pіdkhіd іz zastosuvannym method osnovnymi komponentіv vykoristovuєtsya simplifies the analysis of the reliability of the overhead systems.

The railing system is a typical large-scale folding system with mutual subsystems, as if to replace the numerical components. The reliability of the system is taken for the account of the second visits from the technical service, and the economic management of assets will require an accurate assessment of the reliability at the lowest level. Prote data real ї nadіnostі less than the equal components of the air system, which is always available in practice, but about the completion. Rozpodil zhittєvih tsiklіv komponentіv vіd virobnikіv often hovaєєєєє sladnyuєєєєє sladnyuєєsya actual vikoristannyam and working middle. In this manner, the analysis of the validity of the analysis of the vitality of the methodology for evaluating the hour of life of the component in the minds of the presence of data about the speech.

The method of the main components in the modern sciences is victorious for the achievement of two main tasks:

  • analysis for the data of sociological studies;
  • inspire models of suspіlnyh yavisch.

Algorithms for the distribution of models

Algorithms to the method of principal components give more information about the structure of the model and its interpretation. The stench is indicative of how the PCA wins in various disciplines. Algorithm for non-linear iterative partial least square NIPALS using the last component calculation method. The calculation can be pinned to the end of the line, if you care enough that it’s enough. More computer packages may tend to win the NIPALS algorithm, but there are two main advantages:

  • Vіn opratsovuє vіdsutnі data;
  • sequentially calculate the components.

Meta view of the algorithm:

  • give additional information about those that mean the promotion of that assessment;
  • shows how the skin component does not lie orthogonally with other components;
  • show how the algorithm can process the data that is available.

The algorithm sequentially draws the skin component, starting from the first directly with the largest variance, and then the other, and so on. NIPALS calculates one component at a time. Calculating the first equivalent of t1t1, as well as p1p1 vectors, if you would have known from the power value or the distribution for singular values, you can process the data in XX. Vіn always converge, but zbіzhnіst іnоdі mоzhe bіlnoy. It is also familiar, like the tightness algorithm for calculating the power vectors and the power values, and works well for great data sets. Google hacked the algorithm for early versions of the power-based puzzling system.

The algorithm for NIPALS readings is below.

Estimates of the coefficient of the matrix T are then calculated as T=XW and often the coefficients of the regression of squares B from Y on X are calculated as B = WQ. An alternative method for estimating the parts of the regression of the partial least squares can be described as follows.

The method of head components is a tool for designating the main axes of dispersion in a data set and allows you to easily follow the key changes in data. The correct stowing method is one of the most advanced in the set of tools for data analysis.

Component analysis is considered to be different methods of reducing the volume. Vіn revenge one way - the way of the main components. The head components are in an orthogonal coordinate system, and the variance of the components characterize their statistical power.

Vrahovyuchi, which objects of success in the economy are characterized by a great number of signs, influencing such a great number of vipadkovy reasons.

Calculation of the main components

The first head component Z1 of the secondary system sign X1, X2, X3, X4, ..., Xn is called such a centered-normalized linear combination of signs, as the middle centered-normalized linear combinations of signs have the greatest dispersion.

As another head component Z2, we will take such a centered - normalized combination of signs, like:

not correlated with the first head component,

not correlated with the first head component, this combination has the greatest dispersion.

The K-th head component Zk (k=1…m) is called such a centered - normalized combination sign, like:

not correlated with up to -1 forward head components,

the middle of the most possible combinations of outward signs, if not

do not correlate with up to -1 forward head components, this combination has the greatest dispersion.

Let us introduce an orthogonal matrix U and pass from changing X to changing Z, moreover

The vector is chosen so that the dispersion is maximum. If possession is chosen, so that the dispersion is maximum for the mind, which does not correlate with etc.

Oskіlki znaka vymiryanі in neporіvnyannymi values, then it is better to go to the centered-normalized values. The matrix of external centered-normalized values ​​is known from the reference:

de - unbiased, it is possible that an effective assessment of mathematical grading is possible,

Unchanged, it is possible that an effective assessment of dispersion.

The matrix of warnings of the meaning of the outward signs was pointed out by Dodatku.

Centering and standardization is done with the assistance of the "Stadia" program.

If there are signs of centering and normalization, then the evaluation of the correlation matrix can be developed using the formula:


Before that, as we conduct a component analysis, we will analyze the independence of the external signs.

Revalidation of the significance of the matrix of male correlations for the additional criterion of Wilks.

We make a hypothesis:

H0: insignificant

H1: significant

125,7; (0,05;3,3) = 7,8

since > , then the hypothesis H0 is considered and the matrix is ​​significant, therefore, it is possible to conduct a component analysis.

Reversing the hypothesis about the diagonality of the covariance matrix

We make a hypothesis:

Budєmo statistics, rozpodіlenu for the law from the steps of freedom.

123,21, (0,05;10) =18,307

since >, then the hypothesis H0 is considered and it is possible to conduct a component analysis.

In order to induce matrix factorization, it is necessary to assign the appropriate numbers of the matrix, violating the alignment.

It is necessary to use the eigenvals function of the MathCAD system for the operation, as it rotates the matrix numbers using the power:

Because we took away not the power of the number and the power of the matrix vector, but the assessment. Us tsіkavitime naskіlki "good" zі statisticheskij point zor vibrkovі characteristics describe vіdpovіdnі parameters for general ї sukupnostі.

Confidence interval for the i-th power number follows this formula:

Complimentary intervals for their numbers in the result look like:

Evaluation of the value of a number of the best numbers is taken from the confidence interval of the smallest numbers. It is necessary to reverse the hypothesis about the multiplicity of the power numbers.

Rechecking the multiplicity is required for additional statistics

de r-number of multiple roots.

Tsya statistics at the time of justice is divided according to the law from the number of steps of freedom. Visunemo hypotheses:

Oskіlki hypothesis vydkidaetsya, so the power of the number and not a multiple.

Oskіlki hypothesis vydkidaetsya, so the power of the number and not a multiple.

It is necessary to see the main components only on the level of informativeness of 0.85. The world of information shows some part or some part of the variance of the external signs to form the k-first head components. For the world of information, we name the value:

On a given level of information, three main components were seen.

Let's write the matrix =

To remove the normalized vector to the transition from the outward signs to the main components, it is necessary to change the equalization system: After the correction of the solution of the system, it is necessary to normalize the correction vector.

For the implementation of this task, we use the eigenvec function of the MathCAD system to speed up the normalization vector for a variable power number.

In our view, the first four head components are sufficient to reach the given level of information, so the matrix U

We will be the matrix U, the columns of which are the power vectors:

Matrix of your coefficients:

Matrix coefficients A є correlation coefficients between centered - normalized outward signs and non-normalized head components, and show the obviousness, strength and direct linear connection between the outward signs and the outward head components.

Principal Component Method

Principal Component Method(Engl. Principal component analysis, PCA ) is one of the main ways to change the diversity of data by using the least amount of information. Vinaydeny K. Pearson Karl Pearson ) at r. Zastosovuetsya in rich areas, such as recognition of images, computer zir, data clutter, and so on. Another method of principal components is called to the transformations of Karhunen-Loev(Engl. Karhunen-Loeve) or the transformation of Hotelling (eng. Hotelling transform). Other ways to change the data diversity are the method of independent components, rich scaling, as well as numerical non-linear aggregation: the method of head curves and variation, the method of spring maps, the best projection method (eng. Projection Pursuit), neuromerezhev method of "Voice throat", that іn.

Formal statement of the problem

The task of analyzing the main components, at least, at least, at least some of the basic versions:

  • to approximate data with linear differences of lesser dimensionality;
  • to know the subspace of the smaller size, in the orthogonal projection on the yak_rozkid danih (so that the mid-square expanse of the mean value) is the maximum;
  • to know the subspace of the smaller size, in the orthogonal projection on the yak mean-square distance between the points as much as possible;
  • for a given rich variable variable value, induce such an orthogonal transformation of coordinates that, as a result of correlation between other coordinates, transform to zero.

The first three versions operate with the final scores of data. The stench is equivalent and not victorious to any hypothesis about the statistical generation of data. The fourth version is based on vertical values. Kіntsevі mulіnіnі z'yavlyayutsya yavlyayutsya here like vybіrki z given rozpodіlu, and vіrіshennya truh first zavdan - yak prizhennja to the "true" reincarnation of Karhunen-Loev. We blame the addendum and the whole trivial supply of the accuracy of the approach.

Approximation of data by linear differences

Illustration to the famous work of K. Pirson (1901): given points on the plane, - go straight to the straight line. Shukaetsya straight, scho minimize the sum

The method of head components originated from the task of the best approximation of the terminal multiplier of points by straight lines and planes (K. Pirson, 1901). Dana kіntseva anonymous vectors. For skin environments, we need to know that the sum of squares of health is minimal:

,

de - Euclidean from the point to the linear difference. Be-yak - peacefully linear raznomanittya can be given as anonymous linear combinations, de parameters run through the speech line, and - orthonormal typing of vectors

,

de Euclidean norm, - Euclidean scalar twir, or in coordinate form:

.

The development of the approximation problem for is given by a set of inputs of linear differences, . Numbers of linear differences are determined by an orthonormal set of vectors (vectors of principal components) and a vector. The vector looks like a solution to the problem of minimization for:

.

The vectors of the main components can be found as a solution to similar optimization problems:

1) centralized data (visible average): . Now; 2) we know the first head component as a task; . If there is not one solution, then we choose one of them. 3) We can see from this projection of the first head component: ; 4) the other head component knows how to solve the problem. If there is not one solution, then we choose one of them. … 2k-1) We can see the projection on the -th head component (guess that the projections on the front main components are already visible): ; 2k) the k-th head component is known as the solution of the problem: . If there is not one solution, then we choose one of them. …

At the dermal stage, we see a projection on the anterior head component. The vectors of orthonormalization are found simply as a result of the development of the described optimization problem, in order not to give pardons to the calculation and to destroy the mutual orthogonality of the vector in the head components, you can turn on the optimization task.

The inadequacy of the assigned crim of a trivial swaville in the choice of a sign (and accomplishing the same task) can be more accurate and considered, for example, from the minds of the symmetry of the data. The rest of the head component is a single vector orthogonal to the front one.

Search for orthogonal projections with the largest differences

The first head component maximizes the vibrational variance of the data projection

Let us be given the centering of a set of vectors of data (the arithmetic mean of the value is equal to zero). Task - to know such an orthogonal transformation to a new coordinate system, which would be correct such conditions:

The theory of singular alignment was created by J. J. Sylvester (Eng. James Joseph Sylvester ) in m.

A simple iterative singular decomposition algorithm

The main procedure is to search for the best approximation of a sufficiently large matrix in the form of a matrix (de - world vector, a - world vector) by the least squares method:

The solution to the problem is given by successive iterations following explicit formulas. With a fixed vector, the values ​​that deliver the minimum form are uniquely and explicitly assigned to equalities:

Similarly, with a fixed vector, the following values ​​are assigned:

As an approximation of a vector, we take a variable vector of a single value, a vector is calculated, a vector is calculated for that vector, etc. The value is changed. As a criterion for the fluctuation, there are three distinct changes in the value of the minimized functional for a short iteration () or three of the most significant.

The result of the matrix was subtracted from the closest approximation of the matrix type (here, the top index of values ​​is the number of the approximation). Further, from the matrix, I can see the matrix and for the removed matrix, the trick is again looking for the best approximation of the same kind, etc., until, for example, the norm becomes sufficiently small. Through the war, we took away the iterative procedure for laying out the matrix as a sum of matrices of rank 1, tobto . As a result, the approximation of singular numbers and singular vectors (right - and left - ) was eliminated.

Before the algorithm can overcome it, its simplicity and the ability to transfer it without change to data with gaps, as well as important data.

Establish various modifications of the basic algorithm to improve the accuracy and stability. For example, the vectors of the head components in case of various faults are orthogonal “according to habits”, prote with a large number of iterations (great diversity, rich component) small deviations in orthogonality accumulate and may need special correction on the skin croc, the important safety of the head component.

Singular arrangement of tensors and tensor method of head components

Often, a vector of data may add to the structure of a rectangular table (for example, a plane of the image) to create a rich table - tobto the tensor : , . It is also effective for this person to have a singular layout. Designated, the main formulas of the algorithms can be transferred practically without changes: the replacement of the data matrix may be the index value, the first index is the number of the point (tensor) of the data.

The main procedure is to search for the best approximation of a tensor by a tensor of the form (de - peace vector ( - number of data points), - size vector at ) by the least squares method:

The solution to the problem is given by successive iterations following explicit formulas. As a matter of fact, all vectors-multipliers of one crimson are set, and the one that is left out is clearly represented by sufficient minds at least.

In the beginning of the proximity of the vector () take the opposite vector and the single value, calculate the vector , give for this vector and these vectors in the calculation vector, etc. (cyclically sorting through the index) Algorithm, maybe converge. As a criterion for the fluctuation, there are three significant changes in the value of the minimized functional for a cycle, or three of the most significant. Farther, from the tensor one can see the nearness and the excess again shukayemo the best nearness of the same kind. bud., poke, for example, the norm of the chergovogo surplus will be small.

This rich-component singular layout (the tensor method of the head components) is successfully used when processing images, video signals and, wider, more or less data, so that a tabular or tensor structure can be formed.

Matrix transformation to the main components

The matrix of converting data to the main components is composed of the vectors of the main components, sorted in order of changing their values:

(means transposition),

Tobto, the matrix is ​​orthogonal.

Most of the variations of these data will be marked in the first coordinates, which allows you to go to the expanse of less space.

Zalishkov dispersion

Give the centering data, . When replacing the data vectors on the main projection on the first head component, the average square of the pardon from the resurrection is entered per one data vector:

where the values ​​of the empirical covariance matrix are, sorted in the order of change, with the multiplicity adjusted.

Tsya value is called excess dispersion. Value

called explained dispersion. Їhnya sum dorivnyuє vibrkovіy variance. Vіdpovіdny square vіdnoї pardon - tsі vіdnennia surplus variance to vibrіkovo variance (tobto part of unexplained variance):

For an outstanding pardon, the evaluation of the method of the main components with the design on the first components is evaluated.

Respect: in most of the numbering algorithms, the power numbers with the most powerful power vectors - the head components are counted in the order "from the largest to the smallest". For calculation, it is enough to calculate the first numbers and the next empirical covariance matrix (the sum of the diagonal elements, that is, the variances along the axes). Todi

Selection of the main components according to the Kaiser rule

Tsіlovy pіdkhіd before estimating the number of head components for the necessary frequent explained variance formally zastosovuє zavzhd, prote implicitly transferring, scho podіl to "signal" and "noise", and whether or not the accuracy of the sensor is predetermined. This is why it is often productive other heuristics, which is based on the hypothesis about the presence of a “signal” (a uniformly small volume, a clearly large amplitude) and “noise” (a large volume, a clearly small amplitude). From the first glance, the method of principal components works as a filter: the signal is removed, more importantly, in the projection of the first principal components, and the proportion of noise is richer in the other components.

Nutrition: how to evaluate the number of necessary main components, as if the signal / noise ratio is not known in advance?

The simplest and oldest method for the selection of head components is given Kaiser's rule(Engl. Kaiser's rule): significant main components, which

to change the average value (average vibratory dispersion of the coordinates of the data vector). The Kaiser's rule is well practiced in the simplest cases, if there are a few of the main components, the average value is richly overturned, and the other power numbers are smaller than the new one. In folding situations, it can give even more significant main components. As given normalization to a single vibrational dispersion along the axes, the Kaiser rule is especially simple in appearance: the significant main components that

Estimation of the number of main components according to the rule of the evil cane

Example: Estimation of the number of main components according to the rule of broken reeds in size 5.

One of the most popular heuristic approaches to assessing the number of necessary head components is evil cane rule(Engl. Broken stick model). A set of normalizations for a single sum of all numbers (, ) is equal to the difference between the dots of the ulamkіv reeds of a single dozhina, the broken point is broken (the points are vibrated independently and equal to the dots of the reeds). Come on () - Dovzhini otrimanih shmatkіv reeds, numbered in the order of change of dozhini:. It does not matter to know the mathematical refinement:

According to the rule of evil cane, the th power vector (in the order of change of power numbers) is taken from the list of head components, which means

Rice. a butt for a 5-fold vipad was pointed:

=(1+1/2+1/3+1/4+1/5)/5; =(1/2+1/3+1/4+1/5)/5; =(1/3+1/4+1/5)/5; =(1/4+1/5)/5; =(1/5)/5.

Selected for butt

=0.5; =0.3; =0.1; =0.06; =0.04.

According to the rule of an evil reed, in this butt, there are 2 smut components:

According to the estimates of coristuvachs, the rule of evil cane may tend to underestimate the number of significant main components.

Rationing

Rationing after reduction to the main components

After designing on the first principal components and manually normalizing to a single (selective) dispersion along the axes. The dispersion of the air and the head component is more expensive), so for normalization it is necessary to divide the corresponding coordinate by . This transformation is not orthogonal and does not take a scalar creation. The covariance matrix of the projection of data becomes single after normalization, the projections on whether or not two orthogonal lines become independent quantities, and whether or not the orthonormal basis becomes the basis of the main components (guessing that the normalization changes the orthogonality of the vector). Vіdobrazhennya from the expanse of output data on the first head components and at the same time with normalization is given by the matrix

.

The transformation itself is most often called the Karhunen-Loev transformation. Here are vectors, and the upper index means transposition.

Rationing up to the calculation of the main components

Advance: not a trace of erroneous normalization, as to be carried out after the transformation to the main components, with normalization and "nerving" when redistribution of data, which is carried out before the calculation of the main components. Forward normalization is necessary for a rounded choice of metrics, in which the best approximation of data can be calculated, or the most straight line of the largest distribution (which is equivalent) should be calculated. For example, if given by trivi- mer vectors of “meters, liters and kilograms”, then with the variation of the standard Euclidean difference of 1 meter along the first coordinate, the same contribution will work, that the difference of 1 liter on the other, or 1 kg on the third. Call the systems of 1, for which visual data are presented, insufficiently accurately reflect our statements about natural scales along the axes, and carry out “disarming”: the skin coordinate is subdivided into a sing scale, which is designated as data, the numbers of their processing and processes of vimiryuvannya and the collection of data.

There are three different standard approaches to such standardization: single variance along the axes (the scale along the axes is equal to the mean quadratic improvement - after the second transformation of the covariance matrix is ​​scaled with the matrix of correlation coefficients), on equal to the accuracy of the world(Scale along the axis of proportional accuracy of the given value) and on equal vimogi at the task (the scale along the axis is determined by the necessary accuracy of the forecast of a given value, or by admissible events - equal tolerance). Introduction to the vibilization of the tasks in the vibration of the tasks, and I worked for the acquisition of the Dones (a thoughth of the Yakschko Dia Dia Dia Dani is not completed, then nerazіonially Vyibrates Normuvnaya strictly on the dispersion identity, Navischko Tseva Vіdpovіdaє Zm_sta Delivani, Oskilki Tsey otrimannya new portion, wisely choose a reasonable scale, roughly assessing the standard intake, and do not change it further).

The forward normalization to single dispersion along the axes collapses by turning the coordinate system, as the axes are the head components, and the normalization when re-doing the data does not replace the normalization after reduction to the head components.

Mechanical analogy and the method of head components for ranking data

In order to match the skin vector of data to a single mass, then the empirical covariance matrix changes with the inertia tensor of the system of point masses (let us subdivide by the same mass), and the problem of head components - from the tasks of reducing the inertia tensor to head axes. It is possible to win additional freedom in choosing the value of the mass for the importance of the points of data or the superiority of their values ​​(important tributes or tributes from the larger superior dzherel are attributed to the great masses). Yakscho the vector of data hopes masa, then the replacement of the empirical covariance matrix is ​​taken

All further operations from reduction to the main components are vibrated in the same way, as in the main version of the method: we judge the orthonormalization of the power basis, it is orderly possible for the change in the power values, we evaluate the mean value of the approximation of the normalization of the numbers given by the summation of the first components,

Greater hot way of calling is given maximization of the value of the sum of paired views between projections. For skin two points of data, vaga is introduced; that . The replacement of the empirical covariance matrix is ​​victorious

When the symmetric matrix is ​​positively assigned, the scales are positive quadratic form:

We gave an orthonormalization of the power basis, ordering it after the fall of the power values, estimating the average pardon of the approximation of data by the first components, etc. - exactly the same way, as in the main algorithm.

Whose way to stagnate for manifestness of classes: for different classes, the vaga vaga is selected higher, lower for points of the same class. In this way, in the projection on the ranks, the main components of the different class "rozsuvayutsya" on a larger scale.

More zastosuvannya - lowering the infusion of great tricks(Outlayer, eng. Outlier ); In this way, a modification of the head component method is described, which is more robust, less classical.

Special terminology

The statistics for the head component method has a number of special terms.

Data Matrix; leather row - vector retraining danih ( centering and right rationing), number of rows - (number of vectors of data), number of columns - (expansion of data);

Navantagen matrix(Loadings); kozhen stovpets - vector of head components, number of rows - (expansion of space data), number of stovpts - (number of vectors of head components, selected design);

Rachunkiv matrix(Scores); skin row - projection of the data vector on the head component; number of rows - (number of vectors in data), number of columns - (number of vectors in the main components, selected for design);

Matrix Z-rachunkiv(Z scores); skin row - projection of the data vector on the main components, normalized to a single vibrational variance; number of rows - (number of vectors in data), number of columns - (number of vectors in the main components, selected for design);

pardon matrix(otherwise surplus) (Errors or residuals) .

Basic formula:

Mezhі zastosuvannya and zamezhennya effektivnosti method

Principal Component Method The broader assertion about those who are stagnant only to normally distributed data (otherwise, for roses that are close to normal) is not so: K. Pearson’s standard formula should be approximations the last multiplication of data and the next day to create a hypothesis about their statistical generation, without seeming already about it.

Prote method, which always effectively reduces the rozmіrnіst when setting the liming for accuracy. Straight planes do not always provide a good approximation. For example, data can follow with good accuracy whether it is a curve, and that curve can be neatly sorted in the expanse of data. In this case, the method of head components for acceptable accuracy should be larger than the number of components (replacement of one), otherwise it will not give a decrease in size with acceptable accuracy. For work with such “curved” head components, the method of head differences and different versions of the non-linear head component method was found. More inaccuracies can lead to given folding topology. For their approximations, we also found different methods, for example, Kohonen's maps, which are self-organizing, neural gas or topological grammars. If the given data is statistically generated from the root component, which looks like a normal one, then to approximate the root component, to approximate the root component independent components, although it is no longer orthogonal to the outward scalar creation of Nareshti, for an isotropic rozpodіl (navіt normal) the replacement of the elіpsoїda rozsiyuvannya is taken by the ball, and it is impossible to change the rozmirnіst by the methods of approximation.

Apply victoria

Visualization of data

Visualization of data - presentation in the original form of data to experiment and the results of theoretical research.

The first choice in the visualization of the data multiplication is the orthogonal projection onto the plane of the first two head components (or the 3-dimensional space of the first three head components). The design area is, in fact, a flat two-dimensional "screen", ruffled in such a way as to provide a "picture" of data with the smallest creations. Such a projection will be optimal (middle orthogonal projections on different two-dimensional screens) for three projections:

  1. The minimum sum of squares between the data points to the projections on the area of ​​the first head components, so that the expansion screen is as close as possible in terms of projection to the gloomy points.
  2. The minimum amount of creation of squares between squares is a pair of points from the darkness of data after designing a point on a plane.
  3. The minimum amount of creation of squares is between the points of data and the “center of gravity”.

Visualization of data is one of the most widely used additions to the method of head components and non-linear considerations.

Image and video compression

To change the spaciousness of the outer space of the pixels, the hour of the coding of the image and the video will be played by the linear transformation of the blocks of pixels. Steps of quantization of omitting coefficients and coding without waste allow omitting significant coefficients of compression. The alternative transformation of PCA as a linear transformation is optimal for certain types of data in terms of the size of the data taken from the same data at the same time. At the moment, this method is not actively promoted, mainly due to the great computational complexity. So squeezing these data can be reached, showing the remaining coefficients of transformation.

Suppressing noise in images

Chemometrics

The head component method is one of the main methods in chemometrics. Chemometrics ). Allows you to divide the matrix of output data X into two parts: “replacement” and “noise”. For naybіlsh popular viznachennyam "Chemometrics - tse hіmіchna distsiplіna scho zastosovuє matematichnі, statistichnі that INSHI method zasnovanі on formalnіy logіtsі for pobudovi abo vіdboru optimally metodіv vimіryuvannya that planіv eksperimentu and takozh for otrimannya nayvazhlivіshoї Informácie at analіzі experiental danih".

Psychodiagnostics

  1. data analysis (description of the results of the experiment on some of the other results, as in the case of looking at arrays of numerical data);
  2. description of social phenomena (positive models of phenomena, zocrema and mathematical models).

In political science, the method of head components is the main tool for the project "Political Atlas of the World" for linear and non-linear analysis of ratings in 192 countries of the world for five special integrated integral indices (equals of life, international income, threats, powers). For cartography of the results of this analysis, a special GIS (Geoinformation system) was developed, which is a sign of geographical expanse. Also, a map of the data of the political atlas was created, which is the basis of the two-world main differences in the five-world expanse of the country. Identity of data cards in the form of a geographic map in that, in a geographic map, the instructions show objects that may have similar geographic coordinates, while in the map of data, the instructions show objects (edges) with similar signs (indexes).

In this article, I want to talk about those, as the most practical method of principal component analysis (PCA - principal component analysis) from the point of view of insight, which is behind the mathematical apparatus. Naib_sh is simple, but it is reported.

Mathematics vzagali already garna that vitonchen science, but at the same time beauty hovaetsya behind a bunch of balls of abstraction. Show your beauty the most beautifully on simple butts, like, so be it, you can twist it, smash it and touch it, to the one that you’re wrong, everything is easier to see, it’s easier to look at the first glance, it’s more understandable and reveal.

In the analysis of data, like in any other analysis, for an hour we won’t be able to create a simple model that describes the real camp as accurately as possible. Often it happens so that the signs are to be heavily deposited one kind of one of those one-hour presence is transcendental.

For example, the amount of fuel in us is measured in liters per 100 km, and the United States in miles per gallon. At first glance, the magnitude of the difference, but in fact the stench lies one after the other. A mile is 1600 km, and a gallon is 3.8 liters. One sign is strictly deposited in the other direction, knowing one, knowing the other.

But more often it is so rich that the signs of lying one by one are not so strict and (important!) not so obvious. The volume of the engine as a whole positively contributes to driving up to 100 km / year, but do not start. And it may also show up that, due to the improvement of factors that are not visible at first glance (such as the increase in the strength of the fire, the use of light materials and other current achievements), the car’s sound is not strong, but it also spills into the yogo.

Knowing the staleness of that strength, we can use a sprat sign through one, buy more anger, so move it, and practice already with a larger simple model. First of all, save yourself the information, better for everything, don’t give up, but at the very least, help us to use the PCA method.

Vyslovlyuyuchis suvoro, tsey method approximu n-dimension khmara guard to elіpsoїda (tezh n-virіrnogo), pіvosі kakogo i will be future main components. І for projections of such axes (reduced dimensionality) the most information is collected.

Krok 1. Preparation of data

Here, for the sake of simplicity, I will not take the real primary dataset for dozens of signs and hundreds of warnings, but I will expand my most simple toy butt. 2 signs and 10 warnings will be enough to describe what, and the most important thing is to look at the algorithm.

We generate a vibrator:

X = np.arange(1,11) y = 2 * x + np.random.randn(10)*2 X = np.vstack((x,y)) print X OUT: [[ 1. 2. 3. 4.5.6.7.8.9.10.] [ 2.73446908 4.35122722 7.21132988 11.24872601 9.58103444 12.09865079 129 3.9

We have two signs in this selection, which are strongly correlated one with one. For the help of the PCA algorithm, we can easily know the sign-combination and the price of a part of the information and determine the offense of signs with one new one. So let's splurge!

For the cob trohi statistics. Guessing that there are moments in the description of the vipadical magnitude. We need matyuki. ochіkuvannya that variance. You can boldly say what a mat. ochіkuvannya - tse "center of gravity" magnitude, and variance - tse її "razmіri". Roughly kazhuchi, matyuki. scaling indicates the position of the vertical value, and the variance - її razmіr.

The process of projecting onto a vector does not in any way contribute to the average values, so that in order to minimize the loss of information, our vector can pass through the center of our sample. There is nothing terrible for that, as we center our selection - linearly destructible, so that the average value of the sign reached 0.
The operator, which returns the value to the vector of the average values ​​- vin is needed for updating the selection of the external volume.

Xcentered = (X - x.mean(), X - y.mean()) m = (x.mean(), y.mean()) print Xcentered print "Mean vector: ", m OUT: (array([ -4.5, -1.5, -0.5, 0.5, 1.5, 2.5, 3.5, 4.5]), Array ([- 8.44644233, -8.32845585, -4.93314426, -2.56723136, 1.01013491, 7.00558491, 0.58413491, 4.21440647, 9.59501658])) Mean vector: (5.5, 10.314393916)

The variance is to fall into the order of magnitude of the fall value, that is. sensitive to scale. Therefore, as a sign of loneliness in the world, they are strongly perturbed by their own orders, it is recommended to standardize them. In our case, the meanings are not much changed in orders, so for simplicity, I will not change this operation.

Krok 2. Covariance matrix

In a vipad with a rich vipad value (vipad vector), the position of the center will be the same. ochіkuvannyami її projections on the axis. And the axis for the description of її forms is already insufficient only її variances along the axes. Look at the graphs, in the three fluctuations of the values, the same mathematical expectation and dispersion, like the projections on the axis, show the same!


To describe the shape of the vipad vector, a matrix is ​​needed.

Tse matrix, yak maє (i,j)-Element - correlation sign (X i, X j). Let's guess the covariance formula:

It’s easy for our mind to say that E(X i) = E(X j) = 0:

Respectfully, if X i = X j:

And this is true for any vipadkovyh values.

In this order, our matrix along the diagonal will have the variance sign (because i = j), and in the center of the matrix - the covariances of the two sign pairs. And due to the symmetry of the covariance, the matrix will also be symmetric.

Respect: The covariance matrix є zagalnenny variance in different rich variable values ​​- won yak and defines the shape (rozkid) of the vypadkovy value, yak і variance.

First of all, the variance of a one-dimensional variable value is a 1x1 matrix, in which there is a single member of tasks by the formula Cov(X,X) = Var(X).

Then, let's form a covariance matrix Σ for our selection. For which variance X i і X j, and also their covariance. You can speed up with a written formula, but if we got used to Python, then it’s a sin not to speed up the function numpy.cov(X). She accepts as input a list of all signs of the variable magnitude and rotates the covariance matrix and de X - n-universal variable vector (n-number of rows). Function vіdmіnno і dkhodit і for expanding the unbiased variance, і for the covariance of two quantities, і for folding the covariance matrix.
(I'm guessing that in Python a matrix is ​​an array-row of arrays-rows.)

Covmat = np.cov(Xcentered) print covmat, "n" print "Variance of X:", np.cov(Xcentered) print "Variance of Y: ", np.cov(Xcentered) print "Covariance X and Y: " , np.cov(Xcentered) OUT: [[ 9.16666667 17.93002811] [ 17.93002811 37.26438587]] Variance of X: 9.16666666667 Variance of Y:3:3

Krok 3

Ok, we have taken a matrix that describes the shape of our drop size, so we can take it apart by x and y (that’s X 1 and X 2), as well as the flat shape on the plane. Now we need to know such a vector ( only one type), while maximizing the expansion (dispersion) of the projection of our selection on the new one.

Respect: The main variance on the real worldness is the available matrix, and the two concepts are equivalent. When projecting onto a vector, the projection variance is maximized, when projecting to a vast expanse of great orders, the entire covariance matrix is ​​maximized.

Also, take a single vector onto some projection of our projection vector X. Then the projection onto a new path v T X. The projection variance onto the vector will be similar to Var(v T X). In the global view, in the vector form (for centering values), the variance is expressed as follows:

Obviously, projection dispersion:

It is easy to remember that the variance is maximized beyond the maximum value v T Σv. Here Rayleigh's setting will help us. Without going too deep into mathematics, I’ll just say that Rayleigh’s blues can make a special case for covariance matrices:

The rest of the formula can be known for the topic of laying out a matrix on a wave of vectors and that value. x is an arbitrary vector, and is an arbitrary value. The number of own vectors and that value is equal to the size of the matrix (i values ​​can be repeated).

Before the speech, in the English language, the meanings of that vector are called eigenvaluesі eigenvectors obviously.
Meni zdaєtsya, tse sound richly beautiful (and style), lower our terms.

In this way, directly the maximum variance of the projection always changes with the eigenvector, which can have the maximum value, which is more valuable for the variance.

It is also true for projections on a larger number of variables - the variance (covariance matrix) of the projection on the m-world space will be the maximum for the direct m eigenvectors, which may have the maximum power value.

The diversity of our selection is good for two and the number of eigenvectors in her is evident 2. We know them.

The numpy library has implemented the function numpy.linalg.eig(X) where X is a square matrix. You turn 2 arrays - an array of eigenvalues ​​and an array of eigenvectors (vectors). І vectors of normalization - їhnya dozhina dorіvnyuє 1. The very ones that are required. Qi 2 vectors set a new basis for the selection, such that its axis is based on the principles of the approximating ellipse of our selection.



On this graph, we approximated our selection with an ellipse with radii of 2 sigma (that is why 95% of all warnings are guilty of revenge - what can we here and poster). I inverted a larger vector (the function eig(X) directed it to the reverse direction) - it is important for us to direct it, not the orientation of the vector.

Krok 4. Reduced volume (projection)

The largest vector can be straight forward, similar to the regression line and projecting on the new our selection and introductory information, derived from the sum of the excess terms of the regression (only now Euclidean, not delta in Y). At times, the presence of signs between signs is already strong, so the loss of information will be minimal. The "price" of the projection - the dispersion behind the smaller Eigenvector - as can be seen from the front graph, is already small.

Respect: the diagonal elements of the covariance matrix demonstrate the variances according to the primary basis, and those її power values ​​- according to the new (by the main components).

It is often necessary to evaluate the amount of spent (and saved) information. The best way to find out is in the hundreds. We take the variance along the skin axis and divide by the total sum of the variances along the axes (that is, the sum of all the power numbers of the available matrix).
So, our larger vector describes 45.994/46.431*100% = 99.06%, and the smaller one seems to be about 0.94%. Introducing a smaller vector and projecting data for a larger one, we spend less than 1% of information! Vidminny result!

Respect: Really, zdebіshogo, as the total input of information to become more than 10-20%, you can calmly reduce the rozmirnіst.

To carry out the projection, as it was planned earlier on croc 3, it is required to carry out the operation v T X (the vector is due to buti dozhini 1). Otherwise, since we have not one vector, but a hyperplane, then instead of the vector v T we take the matrix of basis vectors V T . A subtracted vector (or a matrix) will be an array of projections.

V = (-vecs, -vecs) Xnew = dot(v, Xcentered)

dot(X,Y)- memberwise tvir (this is how we multiply vectors and matrices in Python)

It is not important to remember what the meaning of the projections is in the paintings on the front graph.

Krok 5

From the projection, manually work out, be on the basis of the hypothesis and expand the model. Do not forget to take away the main components and matimut obvious, sensible third-party people, sens. Sometimes, blowing corisno, for example, vyyavlenі wikidi, schob to talk, scho to stand guard over them.

Tse duzhe is simple. We have all the necessary information, and the very coordinates of the basis vectors in the external basis (vectors, on which they were projected) and the vector of the averages (for centering). Take, for example, the maximum value: 10.596… For which we multiply iogo right-handed by the transposition vector i dodamo the vector of the middle ones, or in the global view for all viboki: X T v T +m

Xrestored = dot(Xnew,v) + m print "Restored: ", Xrestored print "Original: ", X[:,9] OUT: Restored: [ 10.13864361 19.84190935] Original: [ 10. 19.9094

Retail is small, but there is more. Adzhe vtrachena information is not confirmed. Prote, because simplicity is important for accuracy, it is proved that the value is approximating the day.

Deputy of laying - rechecking the algorithm

Later, the world took the algorithm, showed how it works on a toy butt, now it’s no longer enough to match yoga with PCA, we’ll implement it in sklearn - even if we’ll be self-correcting.

sklearn.decomposition import PCA pca = PCA(n_components = 1) XPCAreduced = pca.fit_transform(transpose(X))

Parameter n_components I indicate the number of vimiryuvan, on how the projection is carried out, so we want to reduce our dataset to the level of vimiryuvan. In other words - the number of n eigenvectors with the largest possible numbers. Let's reconsider the result of the decrease in volume:

Print "Our reduced X: n", Xnew print "Sklearn reduced X: n", XPCAreduced OUT: Our reduced X: [-9.56404106 -9.02021625 -5.52974822 -2.96481262 0.68933859 0.74406645 2.33433492 7.39307974 5.3212742 10.59672425] Sklearn reduced X: [[-9.56404106 ] [ -9.02021625] [ -5.52974822] [ -2.96481262] [ 0.68933859] [ 0.74406645] [ 2.33433492] [ 7.39307974] [7] 5 5

We rotated the result as a matrix of vector columns (the most canonical view from the point of view of linear algebra), PCA in sklearn rotated the vertical array.

In principle, the price is not critical, just a varto signifies that in linear algebra it is canonical to write matrices through vector-stovpts, and in the analysis of data (those other aspects of the DB areas) warnings (transactions, records) are recorded in rows.

Reversing those other parameters of the model - the function can have a number of attributes that allow you to gain access to intermediate variables:

Mean vector: mean_
- Projection vector (matrix): components_
- Dispersion of projection axes (vibration): explained_variance_
- part of information (part of global dispersion): explained_variance_ratio_

Respect: explained_variance_ show vibirkova variance, as well as the cov() function to generate a covariance matrix unforgiving dispersion!

We take the values ​​equally with the values ​​of the library function.

Print "Mean vector: ", pca.mean_, m print "Projection: ", pca.components_, v print "Explained variance ratio: ", pca.explained_variance_ratio_, l/sum(l) OUT: Mean vector: [ 5.5 10.31439 ( 5.5, 10.314393916) Projection: [[0.43774316 0.89910006]] (0.43774316434772387, 0.89910006232167594) Explained Variance: [41.39455058] 45.9939450918 Explained Variance Ratio: [0.99058588] 0.99058588818

The only difference is in variances, but as we already guessed, we victorious function cov(), like the victorious unbiased variance, then the explained_variance_ attribute is turned to vibrkov. The stench vіdrіznyayutsya less tim, scho persha for otrimannya mat. divide the score by (n-1), and the friend by n. It is easy to misinterpret that 45.99 ∙ (10 - 1) / 10 = 41.39.

All other values ​​vary, which means that our algorithms are equivalent. I respect that the attributes of the library algorithm may have less accuracy, shards of wines, sing-songly, optimizations for swidcode, or simply round the values ​​​​for clarity (otherwise I have some glitches).

Respect: The library method is automatically projected on an axis that maximizes the variance. Don't be rational. For example, I’ve brought this little baby inaccurately downgrading to the point where classification becomes impossible. Prote projection onto a smaller vector can successfully change the size and save the classifier.

Later, we looked at the principles of the work of the PCA algorithm and its implementation in sklearn. I am sure that this article was made clear to those who are only beginning to be familiar with the analysis of data, and also at least a little informative for those who know the algorithm well. Intuitive manifestation is more appropriate for understanding how to practice the method, and understanding is even more important for the correct adjustment of the chosen model. For respect!

PS: Prohannya do not bark the author for possible inaccuracies. The author himself is in the process of learning about data analysis and wants to help the same way, as he is in the process of mastering the value of a marvelous knowledge! Ale, constructive criticism and rіznomanіtny dosvіd u vitayutsya!