7 Correlation coefficient. Pienson correlation criteria

When studying public health and health care in scientific and practical purposes, the researcher often has to carry out a statistical analysis of the relationship between factor and effective signs of statistical aggregate (causal relationship) or determining the dependence of parallel changes in several signs of this totality from any third value (from the total cause of their ). It is necessary to be able to study the features of this connection, to determine its size and direction, as well as assess its accuracy. This uses correlation methods.

  1. Types of manifestation of quantitative links between signs
    • functional communication
    • correlation
  2. Definitions of functional and correlation

    Functional communication - This type of relationship between two signs, when each value of one of them corresponds to a strictly defined value of the other (the circle area depends on the radius of the circle, etc.). Functional communication is characteristic of physical and mathematical processes.

    Correlation - Such a connection, in which each specific value of one feature corresponds to a few values \u200b\u200bof another interconnected feature with it (the relationship between the growth and body weight of the human body; the relationship between the body temperature and the pulse rate, etc.). The correlation bond is characteristic of medical and biological processes.

  3. The practical importance of establishing correlation. Detection of causal between factor and effective features (in assessing physical development, to determine the relationship between working conditions, life and health, in determining the dependence of the frequency of cases of disease from age, experience, availability of industrial intories, etc.)

    The dependence of parallel changes of several signs from some third value. For example, under the influence of high temperatures in the workshop there are changes blood pressure, blood viscosity, pulse frequency, etc.

  4. The value that characterizes the direction and the power of the communication between the signs. The correlation coefficient, which in one number gives an idea of \u200b\u200bthe direction and strength of the connection between the signs (phenomena), the limits of its oscillations from 0 to ± 1
  5. Methods for presenting correlation
    • schedule (scattering diagram)
    • correlation coefficient
  6. Direction of correlation
    • straight
    • obata
  7. Correlation force
    • strong: ± 0.7 to ± 1
    • average: ± 0.3 to ± 0.699
    • weak: 0 to ± 0.299
  8. Methods for determining the correlation coefficient and formula
    • square method (Pearson method)
    • range method (Spearman method)
  9. Methodical requirements for the use of correlation coefficient
    • measuring communication is possible only in high-quality homogeneous aggregates (for example, the measurement of communication between growth and weight in aggregates, homogeneous by sex and age)
    • calculation can be made using absolute or derived values
    • not grouped to calculate the correlation coefficient variation rows (This requirement applies only when calculating the correlation coefficient by the method of squares)
    • number of observations of at least 30
  10. Recommendations for the use of a rings correlation method (Spearman Method)
    • when there is no need to accurately establish communication strength, and fairly indicative data
    • when signs are presented not only quantitative, but also attribute values
    • when the rows of distribution of signs have open options (for example, work experience up to 1 year, etc.)
  11. Recommendations for the use of the method of squares (Pearson method)
    • when accurate establishment of the connection force between the signs is required
    • when signs have only a quantitative expression
  12. Methodology and procedure for calculating the correlation coefficient

    1) Square method

    2) rank method

  13. Correlation Evaluation Scheme for Correlation Coefficient
  14. Calculation of the error of the correlation coefficient
  15. Assessment of the accuracy of the correlation coefficient obtained by rank correlation and the method of squares

    Method 1.
    Accuracy is determined by the formula:

    The criterion T is estimated at the table of values \u200b\u200bt, taking into account the number of degrees of freedom (N - 2), where N is the number of paired option. The criterion T should be equal to or more tabular corresponding to the probability of p ≥99%.

    Method 2.
    Reliability is estimated on a special table of standard correlation coefficients. At the same time, such a correlation coefficient is considered to be reliable when with a certain number of degrees of freedom (N - 2), it is equal to a more table, corresponding to the degree of error-free prediction p ≥95%.

to apply the square method

The task: Calculate the correlation coefficient, determine the direction and strength of the communication between the amount of calcium in water and the rigidity of water, if the following data is known (Table 1). Assess the accuracy of communication. Make a conclusion.

Table 1

Justification of the choice of method. To solve the problem, the square method is selected (Pearson), because Each of the signs (water rigidity and the amount of calcium) has a numerical expression; No open option.

Decision.
The calculation sequence is set out in the text, the results are presented in the table. By building rows of pair of compared signs, referred to them through x (water rigidity in degrees) and through y (the amount of calcium in water in mg / l).

Hardness of water
(in degrees)
Calcium amount in water
(in mg / l)
d H. d u d x x d u d x 2. d Y 2.
4
8
11
27
34
37
28
56
77
191
241
262
-16
-12
-9
+7
+14
+16
-114
-86
-66
+48
+98
+120
1824
1032
594
336
1372
1920
256
144
81
49
196
256
12996
7396
4356
2304
9604
14400
M x \u003d Σ x / n M y \u003d Σ y / n Σ d x x D y \u003d 7078 Σ d x 2 \u003d 982 Σ d y 2 \u003d 51056
M x \u003d 120/6 \u003d 20 M y \u003d 852/6 \u003d 142
  1. Determine the average values \u200b\u200bof M x a number of option "x" and m in a number of option "y" by formulas:
    M x \u003d σх / n (graph 1) and
    M y \u003d σu / n (graph 2)
  2. Find the deviation (D X and D) of each options from the value of the calculated average in the series "X" and in the row "y"
    d x \u003d x - m x (graph 3) and d y \u003d y - m y (graf4).
  3. Find a product of deviations D x x D y and summarize them: Σ d x x D y (graph 5)
  4. Each deviation D X and D has to build a square and summarize their values \u200b\u200bfor a series of "x" and for a number of "y": σ d x \u200b\u200b2 \u003d 982 (graph 6) and Σ d y 2 \u003d 51056 (graph 7).
  5. Determine the product Σ d x 2 x Σ d y 2 and from this work to extract square root
  6. The obtained values \u200b\u200bσ (d x x d y) and √ (Σd x 2 x σd y 2) We substitute in the formula for calculating the correlation coefficient:
  7. Determine the accuracy of the correlation coefficient:
    1st way. Find the error of the correlation coefficient (MR XY) and the criterion T according to the formulas:

    Criterion T \u003d 14.1, which corresponds to the probability of an error-free prediction P\u003e 99.9%.

    2nd way. The accuracy of the correlation coefficient is estimated at the table "Standard correlation coefficients" (see Appendix 1). When degrees of freedom (N - 2) \u003d 6 - 2 \u003d 4, our estimated correlation coefficient R xu \u003d + 0.99 is larger than the table (R tab \u003d + 0.917 at p \u003d 99%).

    Output. The more calcium in the water, the more tough (communication straight, strong and reliable: R Hu \u003d + 0.99, p\u003e 99.9%).

    on the use of a rank method

    The task: We set the direction and strength of the relationship between work experience in years and frequency of injuries if the following data is obtained:

    Justification of the choice of method: To solve the task, only the method of rank correlation can be selected, because The first row of the "work experience in years" has open options (work experience up to 1 year and 7 years or more), which does not allow to use a more accurate method to establish communication between the associated features - the square method.

    Decision. The calculation sequence is set out in the text, the results are presented in Table. 2.

    table 2

    Work experience in years Number of injury Order numbers (ranks) Track difference Square rank difference
    X. Y. d (X-y) d 2.
    Up to 1 year 24 1 5 -4 16
    1-2 16 2 4 -2 4
    3-4 12 3 2,5 +0,5 0,25
    5-6 12 4 2,5 +1,5 2,25
    7 or more 6 5 1 +4 16
    Σ d 2 \u003d 38.5

    Standard correlation coefficients that are considered reliable (by L.S. Kaminsky)

    The number of degrees of freedom - 2 The probability level P (%)
    95% 98% 99%
    1 0,997 0,999 0,999
    2 0,950 0,980 0,990
    3 0,878 0,934 0,959
    4 0,811 0,882 0,917
    5 0,754 0,833 0,874
    6 0,707 0,789 0,834
    7 0,666 0,750 0,798
    8 0,632 0,716 0,765
    9 0,602 0,885 0,735
    10 0,576 0,858 0,708
    11 0,553 0,634 0,684
    12 0,532 0,612 0,661
    13 0,514 0,592 0,641
    14 0,497 0,574 0,623
    15 0,482 0,558 0,606
    16 0,468 0,542 0,590
    17 0,456 0,528 0,575
    18 0,444 0,516 0,561
    19 0,433 0,503 0,549
    20 0,423 0,492 0,537
    25 0,381 0,445 0,487
    30 0,349 0,409 0,449

    1. Vlasov V.V. Epidemiology. - M.: Gootar-Honey, 2004. - 464 p.
    2. Lisitsyn Yu.P. Public health and healthcare. Textbook for universities. - M.: Goeotar-Honey, 2007. - 512 p.
    3. Medica V.A., Yuriev V.K. Course of public health and health lectures: part 1. Public health. - M.: Medicine, 2003. - 368 p.
    4. Minaev V.A., Vishnyakov N.I. and others. Social Medicine and Health Organization (Guidelines in 2 volumes). - St. Petersburg, 1998. -528 p.
    5. Kucherenko V.Z., Agarkov N.M. and other social hygiene and health care organization ( Tutorial) - Moscow, 2000. - 432 p.
    6. S. Glanz. Medical and biological statistics. Per with English. - M., Practice, 1998. - 459 p.

The correlation coefficient reflects the degree of relationship between the two indicators. Always takes a value from -1 to 1. If the coefficient is located about 0, then they say that there is no connection between variables.

If the value is close to one (from 0.9, for example), there is a strong direct relationship between the observed objects. If the coefficient is close to another extreme point Range (-1), then there is a strong reverse relationship between variables. When the value is somewhere in the middle from 0 to 1 or 0 to -1, this is a weak connection (direct or reverse). This relationship is usually not taken into account: it is believed that it is not.

Calculation of the correlation coefficient in Excel

Consider on the example methods for calculating the correlation coefficient, the features of the direct and reverse relationship between variables.

The values \u200b\u200bof X and Y indicators:

Y is an independent variable, X - dependent. It is necessary to find strength (strong / weak) and direction (straight / reverse) links between them. The formula of the correlation coefficient looks like this:


To simplify its understanding, we break into several simple elements.

A strong direct connection is determined between the variables.

Built-in correla function avoids complex calculations. Calculate the pair correlation coefficient in Excel with its help. Call the functions of functions. We find the desired one. Arguments of the function - an array of values \u200b\u200bY and an array of x values:

Show the values \u200b\u200bof variables on the schedule:


A strong bond between Y and x is visible, because The lines go almost parallel to each other. The relationship is direct: grows y - grows X, y decreases - decreases x.



Matrix of paired correlation coefficients in Excel

The correlation matrix is \u200b\u200ba table, on the intersection of rows and columns of which are the correlation coefficients between the corresponding values. It makes sense to build it for several variables.

The matrix of correlation coefficients in Excel is built using the "Correlation" tool from the data analysis package.


Between the values \u200b\u200bof Y and x1, a strong direct relationship was detected. Between X1 and X2 there is a strong feedback. Communication with values \u200b\u200bin the column X3 is practically absent.

Note! Your decision specific task Will look like the same this example, including all tables and explanatory texts presented below, but taking into account your source data ...

A task:
There is a knitted sample of 26 pairs of values \u200b\u200b(x k, y k):

k. 1 2 3 4 5 6 7 8 9 10
x K. 25.20000 26.40000 26.00000 25.80000 24.90000 25.70000 25.70000 25.70000 26.10000 25.80000
y K. 30.80000 29.40000 30.20000 30.50000 31.40000 30.30000 30.40000 30.50000 29.90000 30.40000

k. 11 12 13 14 15 16 17 18 19 20
x K. 25.90000 26.20000 25.60000 25.40000 26.60000 26.20000 26.00000 22.10000 25.90000 25.80000
y K. 30.30000 30.50000 30.60000 31.00000 29.60000 30.40000 30.70000 31.60000 30.50000 30.60000

k. 21 22 23 24 25 26
x K. 25.90000 26.30000 26.10000 26.00000 26.40000 25.80000
y K. 30.70000 30.10000 30.60000 30.50000 30.70000 30.80000

It is required to calculate / build:
- correlation coefficient;
- check the hypothesis of the dependence of random variables x and y, at the level of significance α \u003d 0.05;
- coefficients of the linear regression equation;
- scattering diagram (correlation field) and regression line schedule;

DECISION:

1. Calculate the correlation coefficient.

The correlation coefficient is an indicator of the mutual probability influence of two random variables. Correlation coefficient R. can make values \u200b\u200bfrom -1 before +1 . If absolute value is closer to 1 , then this is evidence of a strong connection between values, and if closer to 0 - That, it speaks of a weak connection or its absence. If absolute value R. Equal to one, then we can talk about the functional connection between the values, that is, one value can be expressed through another by means of a mathematical function.


Calculate the correlation coefficient in the following formulas:
N.
Σ
k \u003d 1.
(x k -m x) 2, Σ y 2. =
M X. =
1
N.
N.
Σ
k \u003d 1.
x k M Y. =

or by formula

R x, y =
M xy - m x m y
S x S y
(1.4), where:
M X. =
1
N.
N.
Σ
k \u003d 1.
x k M Y. =
1
N.
N.
Σ
k \u003d 1.
y k M XY. =
1
N.
N.
Σ
k \u003d 1.
x K y k (1.5)
S X 2. =
1
N.
N.
Σ
k \u003d 1.
x k 2 - m x 2, S y 2. =
1
N.
N.
Σ
k \u003d 1.
y k 2 - m y 2 (1.6)

In practice, formula (1.4) is often used to calculate the correlation coefficient. It requires less computing. However, if the covariance was previously calculated cOV (X, Y), it is more profitable to use formula (1.1), because In addition to the actual covariance, you can use the results of intermediate calculations.

1.1 Calculate the correlation coefficient by formula (1.4)To do this, calculate the values \u200b\u200bof x k 2, y k 2 and x k y k and bring them to Table 1.

Table 1


k.
x K. y K. x K. 2 y K. 2 x K.y K.
1 2 3 4 5 6
1 25.2 30.8 635.04000 948.64000 776.16000
2 26.4 29.4 696.96000 864.36000 776.16000
3 26.0 30.2 676.00000 912.04000 785.20000
4 25.8 30.5 665.64000 930.25000 786.90000
5 24.9 31.4 620.01000 985.96000 781.86000
6 25.7 30.3 660.49000 918.09000 778.71000
7 25.7 30.4 660.49000 924.16000 781.28000
8 25.7 30.5 660.49000 930.25000 783.85000
9 26.1 29.9 681.21000 894.01000 780.39000
10 25.8 30.4 665.64000 924.16000 784.32000
11 25.9 30.3 670.81000 918.09000 784.77000
12 26.2 30.5 686.44000 930.25000 799.10000
13 25.6 30.6 655.36000 936.36000 783.36000
14 25.4 31 645.16000 961.00000 787.40000
15 26.6 29.6 707.56000 876.16000 787.36000
16 26.2 30.4 686.44000 924.16000 796.48000
17 26 30.7 676.00000 942.49000 798.20000
18 22.1 31.6 488.41000 998.56000 698.36000
19 25.9 30.5 670.81000 930.25000 789.95000
20 25.8 30.6 665.64000 936.36000 789.48000
21 25.9 30.7 670.81000 942.49000 795.13000
22 26.3 30.1 691.69000 906.01000 791.63000
23 26.1 30.6 681.21000 936.36000 798.66000
24 26 30.5 676.00000 930.25000 793.00000
25 26.4 30.7 696.96000 942.49000 810.48000
26 25.8 30.8 665.64000 948.64000 794.64000


1.2. Calculate M x according to formula (1.5).

1.2.1. x K.

x 1 + x 2 + ... + x 26 \u003d 25.20000 + 26.40000 + ... + 25.80000 \u003d 669.500000

1.2.2.

669.50000 / 26 = 25.75000

M x \u003d 25.750000

1.3. Similarly, calculate M y.

1.3.1. Mix consistently all the elements y K.

y 1 + y 2 + ... + y 26 \u003d 30.80000 + 29.40000 + ... + 30.80000 \u003d 793.000000

1.3.2. We divide the amount received by the number of sampling elements

793.00000 / 26 = 30.50000

M y \u003d 30.500000

1.4. Similarly calculate M xy.

1.4.1. Mix consistently all elements of the 6th column of table 1

776.16000 + 776.16000 + ... + 794.64000 = 20412.830000

1.4.2. We split the amount received by the number of items

20412.83000 / 26 = 785.10885

M xy \u003d 785.108846

1.5. Calculate the value of S x 2 by formula (1.6.).

1.5.1. Moving sequentially all elements of the 4th column of table 1

635.04000 + 696.96000 + ... + 665.64000 = 17256.910000

1.5.2. We split the amount received by the number of items

17256.91000 / 26 = 663.72731

1.5.3. Subscribe from the last number the square of the value M x is obtained for S X 2

S X 2. = 663.72731 - 25.75000 2 = 663.72731 - 663.06250 = 0.66481

1.6. Calculate the value of S y 2 by formula (1.6.).

1.6.1. Mix the sequentially all elements of the 5th column of the table 1

948.64000 + 864.36000 + ... + 948.64000 = 24191.840000

1.6.2. We split the amount received by the number of items

24191.84000 / 26 = 930.45538

1.6.3. Submount from the last number the square of the M y value will be obtained for S y 2

S y 2. = 930.45538 - 30.50000 2 = 930.45538 - 930.25000 = 0.20538

1.7. Calculate the product of the values \u200b\u200bof S x 2 and S y 2.

S x 2 S y 2 \u003d 0.66481 0.20538 \u003d 0.136541

1.8. Removing the last number square root, we get the value of S X S y.

S x S y \u003d 0.36951

1.9. Calculate the value of the correlation coefficient by formula (1.4.).

R \u003d (785.10885 - 25.75000 30.50000) / 0.36951 \u003d (785.10885 - 785.37500) / 0.36951 \u003d -0.72028

Answer: R x, y \u003d -0.720279

2. Check the significance of the correlation coefficient (we check the dependence hypothesis).

Since the evaluation of the correlation coefficient is calculated on the final sample, and therefore may deviate from its general value, it is necessary to test the significance of the correlation coefficient. Check is performed using T-criteria:

t \u003d.
R x, y
N - 2.
1 - R 2 x, y
(2.1)

Random value t. It is followed by T distribution of Student and on Table T distribution it is necessary to find the critical value of the criterion (T k.α) at \u200b\u200ba given level of significance α. If the formula calculated by formula (2.1) t over the module will be less than T kr.α, then the relationship between random values X and y no. Otherwise, experimental data do not contradict the hypothesis of the dependence of random variables.


2.1. Calculate the value of T-criteria by formula (2.1) we obtain:
t \u003d.
-0.72028
26 - 2
1 - (-0.72028) 2
= -5.08680

2.2. We define on the T distribution table, the critical value of the parameter T kr.α

The desired T of T kr.α is located at the intersection of the string corresponding to the number of degrees of freedom and the column of the corresponding level of importance α.
In our case, the number of degrees of freedom is N - 2 \u003d 26 - 2 \u003d 24 and α \u003d. 0.05 what corresponds to the critical value of the criterion T kr.α \u003d 2.064 (see Table 2)

table 2 t distribution

The number of degrees of freedom
(N - 2)
α \u003d 0.1. α \u003d 0.05 α \u003d 0.02 α \u003d 0.01 α \u003d 0.002. α \u003d 0.001
1 6.314 12.706 31.821 63.657 318.31 636.62
2 2.920 4.303 6.965 9.925 22.327 31.598
3 2.353 3.182 4.541 5.841 10.214 12.924
4 2.132 2.776 3.747 4.604 7.173 8.610
5 2.015 2.571 3.365 4.032 5.893 6.869
6 1.943 2.447 3.143 3.707 5.208 5.959
7 1.895 2.365 2.998 3.499 4.785 5.408
8 1.860 2.306 2.896 3.355 4.501 5.041
9 1.833 2.262 2.821 3.250 4.297 4.781
10 1.812 2.228 2.764 3.169 4.144 4.587
11 1.796 2.201 2.718 3.106 4.025 4.437
12 1.782 2.179 2.681 3.055 3.930 4.318
13 1.771 2.160 2.650 3.012 3.852 4.221
14 1.761 2.145 2.624 2.977 3.787 4.140
15 1.753 2.131 2.602 2.947 3.733 4.073
16 1.746 2.120 2.583 2.921 3.686 4.015
17 1.740 2.110 2.567 2.898 3.646 3.965
18 1.734 2.101 2.552 2.878 3.610 3.922
19 1.729 2.093 2.539 2.861 3.579 3.883
20 1.725 2.086 2.528 2.845 3.552 3.850
21 1.721 2.080 2.518 2.831 3.527 3.819
22 1.717 2.074 2.508 2.819 3.505 3.792
23 1.714 2.069 2.500 2.807 3.485 3.767
24 1.711 2.064 2.492 2.797 3.467 3.745
25 1.708 2.060 2.485 2.787 3.450 3.725
26 1.706 2.056 2.479 2.779 3.435 3.707
27 1.703 2.052 2.473 2.771 3.421 3.690
28 1.701 2.048 2.467 2.763 3.408 3.674
29 1.699 2.045 2.462 2.756 3.396 3.659
30 1.697 2.042 2.457 2.750 3.385 3.646
40 1.684 2.021 2.423 2.704 3.307 3.551
60 1.671 2.000 2.390 2.660 3.232 3.460
120 1.658 1.980 2.358 2.617 3.160 3.373
1.645 1.960 2.326 2.576 3.090 3.291


2.2. Compare the absolute value of T-criteria and T k.α

The absolute value of T-criteria is not less than critical T \u003d 5.08680, T kr.α \u003d 2.064, therefore experimental data, with a probability of 0.95 (1 - α), do not contradict hypothesis On the dependence of random variables X and Y.

3. Calculate the coefficients of the linear regression equation.

The linear regression equation is the equation of a straight, approximating (approximately describing) dependence between random values \u200b\u200bX and Y. If we assume that the value X is free, and Y dependent on x, then the regression equation is recorded as follows


Y \u003d a + b x (3.1), where:

b \u003d.R x, y
Σ y.
Σ X.
= R x, y
S y.
S X.
(3.2),
a \u003d m y - b m x (3.3)

Calculated by formula (3.2) coefficient b. Called the linear regression coefficient. In some sources a. referred to as a constant regression coefficient and b. Accordingly, variables.

The errors of the prediction y at a given value of X are calculated by formulas:

The value of σ y / x (formula 3.4) is also called residual average quadratic deviationIt characterizes the care of y from the regression line described by equation (3.1), with a fixed (specified) value of X.

.
S y 2 / s x 2 \u003d 0.20538 / 0.66481 \u003d 0.30894. Removing from the last number square root - we get:
S y / s x \u003d 0.55582

3.3 Calculate the coefficient B By formula (3.2)

b. = -0.72028 0.55582 = -0.40035

3.4 Calculate the coefficient a By formula (3.3)

a. = 30.50000 - (-0.40035 25.75000) = 40.80894

3.5 Establish the error of the regression equation.

3.5.1 Removing from S y 2 square root we get:

= 0.31437
3.5.4 We calculate the relative error in the formula (3.5)

Δ y / x \u003d (0.31437 / 30.50000) 100% \u003d 1.03073%

4. Build the scattering diagram (correlation field) and regression line graph.

Scattering diagram is graphic image Related pairs (x k, y k) in the form of plane points, in rectangular coordinates with X and Y axes. The correlation field is one of the graphical representations of the associated (pair) sample. In the same coordinate system, the regression line schedule is also built. You should carefully choose the scale and starting points on the axes so that the diagram is as clear as possible.

4.1. We find the minimum and maximum sampling element X is the 18th and 15th elements, respectively, x min \u003d 22.10000 and x max \u003d 26.60000.

4.2. We find the minimum and maximum sampling element y is the 2nd and 18th elements, respectively, y min \u003d 29.40000 and y max \u003d 31.60000.

4.3. On the abscissa axis, select the starting point slightly left the point x 18 \u003d 22.1000, and such a scale so that the point x 15 \u003d 26.60000 can be placed on the axis and the other points distinguished.

4.4. On the axis of the ordinates, we select the starting point slightly left the point y 2 \u003d 29.40000, and such a scale so that the point y 18 \u003d 31.60000 can be placed on the axis and the other points distinguished.

4.5. On the abscissa axis, we place the values \u200b\u200bof X k, and the values \u200b\u200bof the y k on the axis are the ordinate.

4.6. Apply (x 1, y 1), (x 2, y 2), ..., (x 26, y 26) on the coordinate plane. We obtain the scattering diagram (correlation field) shown in the figure below.

4.7. Feature regression line.

To do this, we find two different points with coordinates (X R1, Y R1) and (x R2, Y R2) satisfying equation (3.6), we will apply them to the coordinate plane and spend directly through them. As an abscissa of the first point, take the value x min \u003d 22.10,000. We substitute the value of x min to equation (3.6), we get the order of the first point. Thus, we have a point with coordinates (22.10,000, 31.96127). Similarly, we obtain the coordinates of the second point, putting the x max \u003d 26.60000 as an abscissa. The second point will be: (26.60000, 30.15970).

Regression line is shown in the figure below in red

Note that the regression line always passes through the average values \u200b\u200bof the values \u200b\u200bof X and Y, i.e. Coordinates (M x, M y).

06.06.2018 16 235 0 Igor

Psychology and society

Everything in the world is interrelated. Each person at the intuition level is trying to find the relationship between phenomena in order to be able to influence them and manage them. The concept that reflects this relationship is called the correlation. What does she mean simple words?

Content:

Concept of correlation

Correlation (from Latin "CorrelAlatio" - ratio, relationship) - a mathematical term, which means the measure of the statistical probability dependence between random values \u200b\u200b(variables).



Example: Take two types of interconnection:

  1. First - pen in man's hand. Which way is the hand moving, in the same way and handle. If the hand is at rest, then the handle will not write. If a person barely sticks to her, then the trace on paper will be rich. This type of relationship reflects a tough dependence and is not a correlation. This relationship is functional.
  2. Second view - Dependence between human formation levels and reading literature. It is not known in advance who of people read more: with higher education Or without it. This relationship is random or stochastic, its statistical science is studied, which is engaged only by mass phenomena. If the statistical calculation allows you to prove the correlation bond between the level of education and reading the literature, it will provide an opportunity to make any forecasts, to predict the probabilistic event. In this example, with a lot of probability, it is possible to say that people with higher education are read more reading books, those who are more educated. But since the connection between these parameters is not functional, then we can and make a mistake. You can always calculate the likelihood of such an error, which will be unambiguously small and is called the level of statistical significance (P).

Examples of interconnection between each natural phenomena are: The nutrition chain in nature, the human body, which consists of systems of organs, interrelated and functioning as a whole.

Every day we are confronted with correlation dependence in everyday life: between the weather and good mood, the proper wording of goals and their achievement, positive attitude and luck, feeling happiness and financial well-being. But we are looking for communication, relying not on mathematical calculations, but on myths, intuition, superstition, and superstition. These phenomena are very difficult to translate to mathematical language, express in numbers, measure. Another thing is when we analyze the phenomena that you can calculate, submit in the form of numbers. In this case, we can determine the correlation using the correlation coefficient (R), reflecting the power, degree, tightness and direction of the correlation between random variables.

Strong correlation between random values - Certificate of the presence of some statistical connection specifically between these phenomena, but this connection cannot be transferred to the same phenomena, but for another situation. Often researchers, having received a significant correlation between two variables in the calculations, based on the simplicity of correlation analysis, make false intuitive assumptions about the existence of causal relationships between signs, forgetting that the correlation coefficient is probabilistic.

Example: The number of injured during the ice and the number of accidents among vehicles. These values \u200b\u200bwill be correlated among themselves, although they are absolutely not interrelated, but have only a connection with the total reason for these random events - Holyty. If the analysis did not reveal the correlation relationship between phenomena, this is not yet evidence of the lack of relationship between them, which can be a complex nonlinear, not detected using correlation calculations.




The first to introduce the concept of correlation in the scientific circulation was French paleontologist Georges Kuwier. In the 18th century, he brought the law correlation of parts and organs of living organisms, thanks to which the possibility of restoring the foundation of the entire fossil creature, an animal at the found parts of the body (remains). In statistics, the term correlation first applied an English scientist in 1886 Francis Galton. But he could not remove the exact formula for calculating the correlation coefficient, but this was done by his student - the most famous mathematician and biologist Karl Pearson.

Types of correlation

In importance - highly valued, meaningful and insignificant.

Views

what is R.

High-risk

r corresponds to the level of statistical significance P<=0,01

Meaningful

r corresponds to P.<=0,05

Insignificant

r does not reach P\u003e 0.1

Negative (Reducing the value of one variable leads to an increase in the level of the other: the greater the human phobias, the less likely to take a guidance position) and positive (if the growth of one value entails an increase in the level of the other: the more nervous, the more likely to get sick). If there is no connection between the variables, then such a correlation is called zero.

Linear (When one value increases or decreases, the second also increases or decreases) and nonlinear (when, when a change in one value, the nature of the second change cannot be described using linear dependence, then other mathematical laws are applied - polynomial, hyperbolic addiction).

By power.

Factors




Depending on which scale includes the studied variables, various types of correlation coefficients are calculated:

  1. The Pearson correlation coefficient, the pair linear correlation coefficient or the correlation of the points of the works is calculated for variables from the interval and quantitative measurement scale.
  2. The coefficient of rank correlation of spirit or kendalla - when at least one of the values \u200b\u200bhas a sequence scale is either not normal distributed.
  3. The point of the point-row correlation (the correlation coefficient of the fechner signs) is if one of two values \u200b\u200bis dichotomous.
  4. The coefficient of four-fole correlation (the coefficient of multiple rank correlation (concordation) - if two variables are dichotomous.

The Pearson coefficient refers to parametric correlation indicators, all others - to non-parametric.

The correlation coefficient value is ranging from -1 to +1. With a complete positive correlation, R \u003d +1, with a complete negative - R \u003d -1.

Formula and calculation





Examples

It is necessary to determine the relationship of two variables: the level of intellectual development (according to testing) and the number of desets for the month (according to records in the training journal) from schoolchildren.

The initial data are presented in Table:

IQ (X) data

Data on the number of desets (y)

Sum

1122

Average

112,2


To enable the correct interpretation of the resulting indicator, it is necessary to analyze the correlation coefficient sign (+ or -) and its absolute value (module).

In accordance with the classification table, the correlation coefficient on force we conclude that RXY \u003d -0.827 is a strong negative correlation dependence. Thus, the number of late schoolchildren has a very strong dependence on their level of intellectual development. It can be said that students with a high level IQ are less likely to occupies than students with low IQ.



The correlation coefficient can be used as scientists to confirm or refute the assumption of the dependence of two quantities or phenomena and measure its strength, significance and students for empirical and statistical studies on various subjects. It must be remembered that this indicator is not an ideal tool, it is calculated only for measuring the strength of linear dependence and will always be a probabilistic value that has a certain error.

Correlation analysis is applied in the following areas:

  • economic science;
  • astrophysics;
  • social sciences (sociology, psychology, pedagogy);
  • agrochemistry;
  • metal studies;
  • industry (for quality control);
  • hydrobiology;
  • biometrics, etc.

Causes of the popularity of the correlation analysis method:

  1. The relative simplicity of calculating the correlation coefficients is not needed special mathematical education.
  2. Allows you to calculate the relationship between mass random values \u200b\u200bthat are subject to statistical science. In this regard, this method gained widespread in the field of statistical studies.

I hope now you can distinguish the functional relationship from the correlation and you will know that when you hear on television or read in a correlation press, then under it implies a positive and sufficiently significant interdependence between two phenomena.

In statistics correlation coefficient (english Correlation Coefficient.) Used to test the hypothesis about the existence of a relationship between two random values, and also allows us to evaluate its strength. In a portfolio theory, this indicator is usually used to determine the nature and force of the dependence between the profitability of the security (asset) and the portfolio yield. If the distribution of these variables is normal or close to normal, then you should use pearson correlation coefficientwhich is calculated by the following formula:

The standard deviation of the profitability of the company's shares A will be 0.6398, the shares of the company B 0.5241 and the portfolio of 0.5668. ( How calculated the standard deviation can be read)

The correlation coefficient of the company's profitability of the company A and the profitability of the portfolio will be -0.864, and the shares of the company B 0.816.

R a \u003d -0.313 / (0,6389 * 0,5668) \u003d -0,864

R b \u003d 0.242 / (0,5241 * 0,5668) \u003d 0,816

It can be concluded about the presence of a fairly strong relationship between the portfolio yield and the company's profitability of the company A and the company B. At the same time, the profitability of the Company's shares A demonstrates the multidirectional movement with the portfolio yield, and the profitability of the company's shares b unidirectional movement.

Share with friends or save for yourself:

Loading...