Variation indicators. Calculation of the variance of an alternative attribute By which dependence the variance of an alternative attribute is calculated

Average value of an alternative feature and its variance :

Average value of the alternative characteristic

Variance of an alternative feature

Substituting into the variance formula q= 1 - p, we get:

Thus, alternative feature variance equal to the product fractions of units, possessing this feature and the proportion of units that do not have this feature.

Standard deviation of an alternative feature:

A variation of an alternative attribute consists in the presence or absence of the studied property in the units of the population. Quantitatively, the variation of an alternative feature is expressed in two values: the presence of the studied property in a unit is indicated by a unit (1), and its absence by a zero (0). The proportion of units that have the trait under study is denoted by a letter, and the fraction of units that do not have this trait - through. Considering that p + q = 1 (hence q = 1 - p), and the average value of the alternative feature is

,

mean square of deviations

Thus, the variance of an alternative feature is equal to the product of the fraction of units with this property () by the fraction of units that do not have this property ().

The mean square of the deviation (variance) takes the maximum value in the case of equality of shares, i.e. when i.e. ... The lower bound of this indicator is zero, which corresponds to a situation in which there is no aggregate variation. Standard deviation of an alternative feature:

Selective observation, advantages and disadvantages.

Selective observation is one of the most modern types of statistical observation, in which a part of the units of the studied population are examined, selected on the basis of scientifically developed principles that provide a sufficient amount of reliable data in order to characterize the entire population as a whole.

Average and relative indicators obtained on the basis of sample data should sufficiently fully reproduce the corresponding indicators of the population as a whole.

The main advantages of selective observation are that it can be carried out according to a broader program, it is cheaper in terms of its costs, and it can be organized when and in those cases when we cannot use reporting.

The main disadvantages are that the data obtained always contain errors, and the observation results can only be judged with a certain degree of reliability. It also requires qualified personnel.

Methods for forming a sample.

In statistics, various methods of forming sample sets are used, which are determined by the objectives of the research and depends on the specifics of the object of study.

The main condition for conducting a sample survey is to prevent the occurrence of systematic errors arising from the violation of the principle of equal opportunities for each unit of the general population to be included in the sample. Prevention of systematic errors is achieved as a result of the use of scientifically based methods of forming a sample population.

There are the following ways to select units from the general population:

1) individual selection - individual units are selected in the sample;

2) group selection - qualitatively homogeneous groups or series of the studied units fall into the sample;

3) combined selection is a combination of individual and group selection.

Selection methods are determined by the rules for the formation of the sample population.

The sample can be:

Strictly random consists in the fact that the sample population is formed as a result of a random (unintentional) selection of individual units from the general population. In this case, the number of units selected for the sample population is usually determined on the basis of the accepted proportion of the sample. The proportion of the sample is the ratio of the number of units in the sample n to the number of units in the general population N, i.e.

§ mechanical is that the selection of units in the sample is made from the general population, divided into equal intervals (groups). Moreover, the size of the interval in the general population is equal to the reciprocal of the proportion of the sample. So, with a 2% sample, every 50th unit (1: 0.02) is selected, with a 5% sample, every 20th unit (1: 0.05), etc. Thus, in accordance with the accepted share of selection, the general population is, as it were, mechanically divided into groups of equal size. Only one unit is selected from each group.

§ typical - in which the general population is first divided into homogeneous typical groups. Then, from each typical group, by proper random or mechanical sampling, an individual selection of units is made into the sample population. An important feature of the typical sample is that it gives more accurate results in comparison with other methods of selecting units in the sample;

§ serial - in which the general population is divided into groups of the same size - series. Series are selected for the sample. Within the series, continuous observation of the units included in the series is carried out;

Combined - the sample can be two-stage. In this case, the general population is first divided into groups. Then the groups are selected, and within the latter, individual units are selected.

In statistics, the following methods of selecting units for a sample are distinguished:

§ one-stage sampling - each selected unit is immediately examined according to a given criterion (proper random and serial sampling);

Multistage sampling - selection is made from the general population of individual groups, and individual units are selected from the groups (typical sampling with a mechanical method of selecting units in the sample population).

In addition, a distinction is made between:

§ re-selection - according to the returned ball scheme. Moreover, each unit or series that got into the sample returns to the general population and therefore has a chance to get into the sample again;

Variation- these are the differences in the individual values ​​of the trait in units of the studied population. The study of variation is of great practical importance and is a necessary link in economic analysis. The need to study variation is due to the fact that the average, being the resultant, fulfills its main task with varying degrees of accuracy: the smaller the differences in the individual values ​​of the trait to be averaged, the more homogeneous the population, and, consequently, the more accurate and reliable the average, and vice versa. Consequently, according to the degree of variation, one can judge the boundaries of variation of a trait, homogeneity of the population for this trait, typicality of the average, the relationship of factors determining the variation.

Changing the variation of a feature in the aggregate is carried out using absolute and relative indicators.

Absolute indicators of variation include:

Swing variation (R)

Swipe variation Is the difference between the maximum and minimum values ​​of the characteristic

It shows the limits within which the value of the trait changes in the studied one.

Example... Work experience of five applicants in previous work is: 2,3,4,7 and 9 years.
Solution: range of variation = 9 - 2 = 7 years.

For a generalized characteristic of differences in the values ​​of the attribute, the average indicators of variation are calculated based on taking into account the deviations from the arithmetic mean. The difference is taken as the deviation from the mean.

At the same time, in order to avoid the sum of deviations of the attribute variants from the mean (zero property of the mean), one has to either ignore the deviation signs, that is, take this sum modulo, or square the deviation values ​​into zero.

Average linear and standard deviation

Average linear deviation- this is from the absolute deviations of individual values ​​of the characteristic from the average.

The average linear deviation is simple:

Work experience of five applicants in previous work is: 2,3,4,7 and 9 years.

In our example: years;

Answer: 2.4 years.

Weighted mean linear deviation applies to grouped data:

The average linear deviation, due to its convention, is used in practice relatively rarely (in particular, to characterize the fulfillment of contractual obligations for uniformity of delivery; in the analysis of product quality, taking into account the technological features of production).

Standard deviation

The most perfect characteristic of variation is the standard deviation, which is called the standard (or standard deviation). () is equal to the square root of the mean square of the deviations of the individual values ​​of the feature from:

The standard deviation is simple:

Weighted standard deviation is used for grouped data:

Between the mean square and standard linear deviations under normal distribution conditions, the following relationship takes place: ~ 1.25.

The standard deviation, being the main absolute measure of variation, is used to determine the values ​​of the ordinates of the normal distribution curve, in calculations related to organizing sample observation and establishing the accuracy of sample characteristics, as well as when assessing the boundaries of variation of a feature in a homogeneous population.

Dispersion

Dispersion- is the mean square of the deviations of the individual values ​​of the trait from their mean.

The variance is simple:

In our example:

Weighted dispersion:

It is more convenient to calculate the variance by the formula:

which is obtained from the main one by simple transformations. In this case, the mean square of the deviations is equal to the mean of the squares of the feature values ​​minus the square of the mean.

For non-aggregated data:

For grouped data:

Variation of an alternative feature consists in the presence or absence of the studied property in the units of the population. Quantitatively, the variation of an alternative feature is expressed in two values: the presence of the studied property in a unit is indicated by a unit (1), and its absence by a zero (0). The proportion of units that have the trait under study is denoted by a letter, and the fraction of units that do not have this trait - through. Considering that p + q = 1 (hence q = 1 - p), and the average value of the alternative feature is

,

mean square of deviations

Thus, the variance of an alternative feature is equal to the product of the fraction of units with this property () by the fraction of units that do not have this property ().

The mean square of the deviation (variance) takes the maximum value in the case of equality of shares, i.e. when i.e. ... The lower bound of this indicator is zero, which corresponds to a situation in which there is no aggregate variation. Standard deviation of an alternative feature:

So, if in the manufactured batch 3% of the products turned out to be non-standard, then the variance of the share of non-standard products, and the standard deviation or 17.1%.

Standard deviation is equal to the square root of the mean square of the deviations of the individual values ​​of the feature from the arithmetic mean.

Relative rates of variation

Relative rates of variation include:

Comparison of the variation of several populations for the same feature, and even more so for different features using absolute indicators is not possible. In these cases, for a comparative assessment of the degree of difference, relative indicators of variation are constructed. They are calculated as the ratio of the absolute indicators of variation to the mean:

Other relative characteristics are also calculated. For example, to estimate the variation in the case of an asymmetric distribution, the ratio of the mean linear deviation to the medians is calculated

since, due to the property of the median, the sum of the absolute deviations of the attribute from its value is always less than from any other.

As a relative measure of dispersion that estimates the variation in the central part of the population, the relative quartile deviation is calculated, where is the average quartile of the half-sum of the difference between the third (or upper) quartile () and the first (or lower) quartile ().

In practice, the coefficient of variation is most often calculated. The lower limit of this indicator is zero, it does not have an upper limit, however, it is known that with an increase in the variation of a feature, its value also increases. The coefficient of variation is, in a sense, a criterion for the homogeneity of the population (in the case of a normal distribution).

Let's calculate the coefficient of variation based on the standard deviation for the following example. The consumption of raw materials per unit of product was (kg): for one technology at, and for the other - at. A direct comparison of the value of standard deviations could lead to the misconception that the variation in the consumption of raw materials for the first technology is more intense than for the second (. The relative measure of variation (allows us to draw the opposite conclusion

An example of calculating the indicators of variation

At the stage of selecting candidates for participation in a complex project, the firm announced a competition for professionals. The distribution of applicants by work experience showed the following results:

Let's calculate the average production experience, years

Let's calculate the variance based on the duration of work experience

The same result is obtained if we use a different formula for calculating the variance for the calculation

Let's calculate the standard deviation, years:

Determine the coefficient of variation,%:

Variance addition rule

To assess the influence of the factors determining the variation, a grouping technique is used: the set is divided into groups, choosing one of the determining factors as a grouping attribute. Then, along with the total variance calculated for the entire population, the intra-group variance (or the average of the group) and the inter-group variance (or the variance of the group averages) are calculated.

Total variance characterizes the variation of a feature in the entire set, which has developed under the influence of all factors and conditions.

Intergroup variance measures the systematic variation due to the influence of the factor by which the grouping is made:

Intra-group variance assesses the variation of the trait, which has developed according to the influence of other factors that are not considered in this study and is independent of the grouping factor. It is defined as the average of the group variances.

All three variances () are related to each other by the following equality, which is known as variance addition rule:

this ratio is used to construct indicators assessing the influence of the grouping attribute on the formation of a general variation. These include the empirical coefficient of determination () and the empirical correlation ratio ()

() characterizes the share of intergroup variance in the total variance:

and shows how much the variation of the trait in the aggregate is due to the grouping factor.

Empirical correlation relation(!! \ eta = \ sqrt (\ frac (\ delta ^ 2) (\ sigma ^ 2))

assesses the tightness of the relationship between the studied and grouping characteristics. The limit values ​​are zero and one. The closer to one, the closer the connection.

Example. The cost of 1 square meter of total area (conventional units) in the housing market for ten 17 houses with improved planning was:

At the same time, it is known that the first five houses were built near the business center, and the rest - at a considerable distance from it.

To calculate the total variance, let's calculate the average cost of 1 sq. M. total area: The total variance is determined by the formula :

Let's calculate the average cost of 1 sq.m. and the variance for this indicator for each group of houses differing in location relative to the city center:

a) for houses built near the center:

b) for houses built far from the center:

Variation in the cost of 1 sq. M. the total area caused by a change in the location of houses is determined intergroup variance:

Variation in the cost of 1 sq. M. of the total area, due to the change in the remaining indicators that we do not take into account, is measured intragroup variance

The found variances add up to the total variance

Empirical coefficient of determination:

shows that the variance of the cost of 1 sq. m. of the total area in the housing market by 81.8% is explained by differences in the location of new buildings in relation to the business center and by 18.2% - by other factors.

The empirical correlation relationship indicates a significant impact on the cost of housing, the location of houses.

The rule for adding variances for a fraction the sign is written as follows:

and three types of share variances for grouped data are determined by the following formulas:

total variance:

Intergroup and intragroup dispersion formulas:

Distribution shape characteristics

To get an idea of ​​the distribution form, the indicators of the average level (,), indicators of variation, asymmetry and kurtosis are used.

In symmetric distributions, the arithmetic mean, mode and median coincide (. If this equality is violated, the distribution is asymmetric.

The simplest indicator of asymmetry is the difference, which is positive in the case of right-sided asymmetry, and negative in the case of left-sided asymmetry.

Asymmetric distribution

To compare the asymmetry of several rows, a relative indicator is calculated

Variations are used as generalizing characteristics central distribution moments-th order, corresponding to the degree to which the deviations of individual values ​​of the attribute from the arithmetic mean are raised:

For ungrouped data:

For grouped data:

The moment of the first order, according to the property of the arithmetic mean, is equal to zero.

The second order moment is the variance.

The moments of the third and fourth orders are used to construct indicators that assess the features of the shape of empirical distributions.

Using the third-order moment, the degree of skewness or asymmetry of the distribution is measured.

- coefficient of asymmetry

In symmetric distributions, like all central moments of odd order. The non-zero central moment of the third order indicates the asymmetry of the distribution. In this case, if, then the asymmetry is right-sided and the right branch is elongated relative to the maximum ordinate; if, then the asymmetry is left-sided (on the graph this corresponds to the elongation of the left branch).

To characterize the peakedness or flatness of the distribution, the ratio of the fourth-order torque () to the standard deviation in the fourth power () is calculated. For a normal distribution, therefore, the kurtosis is found by the formula:

For a normal distribution, vanishes. For peaked distributions, for flat-topped ones.

Distribution kurtosis

In addition to the indicators considered above, a generalizing characteristic of variation in a homogeneous population is a certain order in the change in distribution frequencies in accordance with changes in the value of the studied attribute, called distribution pattern.

The nature (type) of the distribution pattern can be identified by constructing a variational series based on a large amount of observations, as well as such a choice of the number of groups and the value of the integrals, at which the pattern could be most clearly manifested.

The analysis of the series of variations involves identifying the nature of the distribution (as a result of the variation mechanism), establishing the distribution function, checking the correspondence of the empirical distribution to the theoretical one.

Empirical distribution, obtained on the basis of observation data, is graphically depicted by the empirical distribution curve using a polygon.

In practice, there are various types of distributions, among which one can distinguish symmetric and asymmetric, single-vertex and multi-vertex.

To establish the type of distribution means to express the mechanism of the formation of a pattern in an analytical form. Many phenomena and their features are characterized by characteristic forms of distribution, which are approximated by the corresponding curves. With all the variety of distribution forms, the most widespread as theoretical ones are the normal distribution, the Pausson distribution, the binomial distribution, etc.

A special place in the study of variation belongs to the normal law, due to its mathematical properties. For the normal law, the rule of three sigma is fulfilled, according to which the variation of individual values ​​of a feature is in the range from the value of the average. At the same time, about 70% of all units are within the boundaries, and 95% are within the limits.

The conformity assessment of empirical and theoretical distributions is carried out using goodness-of-fit criteria, among which Pearson's, Romanovsky's, Yastremsky's, Kolmogorov's criteria are widely known.

INDICATORS OF VARIATION

Methodical instructions for solving problems

On the topic "Indicators of variation"

To measure the degree of variation (variability) of a trait, variation is used, the indicators of which are: the range of variation, the mean linear deviation, the standard deviation, the mean square of deviations (variance), the coefficient of variation.

Swipe variation

The range of variation ( R) characterizes the limits of variation (change) of individual values ​​(or variants) of the attribute ( x) in the statistical population

where is the largest and the smallest value of the feature.

Average linear deviation

The average linear deviation is calculated using the arithmetic mean formulas:

Simple (unweighted)

,

where - i-th feature value x ;

Average value of the feature x ;

Statistical weight i-th attribute value;

n- the number of members of the population;

Weighted

Standard deviation

The standard deviation is calculated using the formulas:

Unweighted

Weighted

Dispersion of a quantitative trait

Dispersion quantitative attribute is determined by the arithmetic mean formulas:

Unweighted

Weighted

The variance can be calculated as follows:

where is the mean square of the values ​​of the feature;

The square of the average value of the feature.

Variance properties of a quantitative trait

1. With a decrease or increase in the weights (frequencies) of the varying attribute in K since the variance does not change

2. When decreasing or increasing each attribute value by the same constant value A variance does not change

where is the average value of the feature ( x- A).

3. When decreasing or increasing each attribute value in the same number K times the variance decreases or increases by K 2 times, and the standard deviation - in K once



where is the average value of the feature xK.

4. Dispersion of a feature with respect to an arbitrary value A always greater than the variance relative to the arithmetic mean per square of the difference between the mean and an arbitrary value

Proof:

Dispersion relative to the mean

Calculating variance by the method of moments

The method of simplified calculation of variance is carried out according to the formula

and is called the way of moments.

Indicators m 1 , m 2 represent moments of the first and second order and are calculated as follows

Proof:

Dispersion of a quantitative trait in aggregate,

Divided into groups

To analyze the relationships of quantitative characteristics in a statistical population divided into groups, the following variances are calculated: group, intergroup, intragroup and total.

Group variance (partial) characterizes the variation of a trait in a group due to the action of all other factors on it, except for the trait underlying the grouping (grouping trait):

where - i-th value of the feature in j-th group;

Private (group) average value of a feature in j-th group;

Statistical weight i-th attribute value in j-th group;

The number of different characteristic values ​​in j group.

Intergroup variance measures the degree of variability (variation) of a trait in the entire statistical population due to the factor underlying the grouping (grouping trait):

where is the average value of the characteristic in the aggregate (total average);

The weight j-th group, representing the number of units in j th

J- the number of groups in the statistical population.

Intra-group variance (average of group variances) measures the degree of variability of a trait in the entire population as a whole due to the action on it of all other factors (traits), except for the grouping trait:

The total variance measures the degree of variability of a trait, due to the influence of all factors acting on it:

The total variance of a feature in a statistical population, divided into groups, can be determined by the basic dispersion formula

Intergroup and total variance are used to determine the indicators of the closeness of the relationship of indicators in the aggregate, divided into groups.

Variance of a qualitative alternative feature

To determine the variance of an alternative feature, let us assume that the total number of population units is equal to n... The number of units with the studied trait - f, then the number of units that do not possess the studied feature is equal to ( n- f). The distribution series of a qualitative (alternative) feature is as follows

Variable value Repetition rate
f n-f
Total n

The arithmetic mean of such a series is:

that is, it is equal to the relative frequency (frequency) of the appearance of the trait under study, which can be denoted by p, then

The proportion of units with the studied trait is equal to p, the proportion of units that do not possess the trait under study is equal to q, then p + q = 1.

Variance of an alternative feature

A special case of an attributive (non-quantitative) feature is an alternative feature. When the units of the population either have a given studied attribute, or do not have it. An example of such signs is: the presence of defective products, an academic degree from university teachers, work in the specialty received, the excess of the average per capita income of their all-Russian level, the presence of children in the family, etc.

If there is an alternative feature, the population unit is assigned the value "1". In the absence - "0".

Weights in the calculations are:

The proportion of units with this feature;

The proportion of units that do not have this feature

Then the average value of the alternative feature is equal to:

the variance will take the form:

The variance of the alternative feature ranges from 0 to 0.25. The maximum value of 0.25 reaches at 0.5

Example 4.11. In a sample survey of 300 residents of Kursk, 60 of them spoke positively about keeping personal savings in the city's commercial banks.

Determine the average level, variance and standard deviation of the trait

The practical application of variation of an alternative feature mainly consists in constructing confidence intervals when conducting a sample observation.

Study of the shape of the feature distribution. Main characteristics of distribution patterns

An indispensable condition for the success of constructions, calculations and conclusions based on the series of variations is the homogeneity of the aggregates summarized in them, established on the basis of deep theoretical analysis.

A clearly expressed order of frequency change in accordance with a change in the value of a feature is called a distribution pattern.

Knowledge of the type of distribution pattern (and, consequently, the shape of the curve) is necessary first of all:

1. To clarify the typical conditions for obtaining primary statistical material. Thus, the appearance of a multi-vertex or substantially asymmetric curve indicates a diverse composition of the population and the need to regroup the data in order to identify more homogeneous groups.

2. To ensure the correctness of practical calculations and forecasts. Thus, the use of G. Sturgess's formula for calculating the optimal number of groups in an interval series, the "three sigma" rule, the coefficient of variation Vy as an indicator of the homogeneity of distributions close to it.

The patterns of the variational series, which are expressed in the type of distribution of their frequencies, clearly appear on the graphs - the histogram and the frequency distribution polygon. Their consideration shows that in the histogram there is a large discontinuity in the distribution, and in the polygon there is a gradual transition from one group to another. Polygon polyline partially smoothes the discontinuity of the histogram; it is a more generalized technique for analyzing the distribution.

With an increase in the rows of an interval variation series and a corresponding decrease in the value of its intervals, the number of sides of the distribution polygon will increase and the broken line will tend to turn into a certain curve in the limit. This curve is called distribution curve... In it, the greatest release of data from the influence of random factors occurs. It reveals and shows in the most generalized form the nature of variation, the pattern of frequency distribution within a single-quality set of phenomena.

Distribution curves can be of different types. In the practice of socio-economic research, the normal distribution curve is widely used. It is a single-vertex symmetric bell-shaped figure, the right and left branches of which decrease uniformly and symmetrically, asymptotically approaching the abscissa axis.

A distinctive feature of this curve is the coincidence of the arithmetic mean, mode and median in it. If the entire area between the curve and the abscissa is taken as 100%, then 68.3% of the frequencies are within the limits, within 95.4%, within 99.7% (the "three sigma rule").

Although the normal, or symmetric, distribution corresponds to the nature of a number of phenomena, however, for social phenomena it is uncharacteristic, since it reflects the differences caused by external influences, inherent not in a developing, but only in a fluctuating set of units. Development and dynamism are characteristic of social phenomena. Therefore, the series and curves of the distribution of the frequencies of social phenomena, as a rule, are asymmetric, in them the frequencies increase to a maximum and decrease from it unevenly. It is the presence of asymmetry, or skewness, in the rows of homogeneous aggregates that serves as an indirect indication that the process under study is undergoing an active stage of development.

Asymmetric series and corresponding curves have different forms of distributions, investigated by mathematical statistics. Such forms are Poisson distribution, Maxwell distribution, Pearson distribution, etc. Here asymmetry is considered as a whole as a single type of distribution. In this case, a distinction is made between right-side and left-side asymmetries (skewness).

If the long branch of the curve is located to the right of the vertex, then the asymmetry is called right-sided, if this branch is located to the left of the vertex - left-sided. With right-sided asymmetry with left-sided. Therefore, the difference between them, referred to, is called the K. Pearson coefficient and is used as the asymmetry coefficient:

With right-sided asymmetry, this coefficient is positive, with left-sided - negative. If = 0, the variation series is symmetrical. The greater the absolute value of the coefficient, the greater the degree of skew.

The most accurate indicator of the distribution skewness is the skewness coefficient calculated by the formula

where n is the number of units in the population. As in the case of the Pearson coefficient, for> 0, there is a right-sided asymmetry, for< 0 левосторонняя. В симметричных распределениях = 0.

The larger the value of ||, the more asymmetric the distribution. The following grading scale for asymmetry has been established:

|| - insignificant asymmetry;

0,25 < || - асимметрия заметная (умеренная);

|| > 0.5 - significant asymmetry.

Since the coefficients and are relative dimensionless quantities, they are often used for comparative analysis of the asymmetry of different distribution series.

The nature of the asymmetry sometimes indicates the direction of development. When studying the variation of signs in relation to which there is an interest in increasing them (fulfillment of norms, production output, etc.), right-sided asymmetry indicates the progressiveness of development, that it goes towards an increase in the indicator, and left-sided asymmetry indicates the presence of a large number of lagging sites.

When studying the variation of signs in relation to which there is an interest in reducing them (cost, labor intensity, consumption of raw materials per unit of production, etc.), right-sided asymmetry indicates shortcomings in the development of the studied process, left-sided - about the progressiveness of its development, about how that the latter goes in the direction of decreasing the indicator. In the distribution of employees by seniority (see example 4.9 = 5.75), right-sided asymmetry is observed, since the asymmetry coefficient is positive: (5.955-5.75): 2.47 = 0.095. This asymmetry is progressive for this series, it indicates the development of the series towards an increase in the studied indicator.

The shape of the distribution can be roughly determined directly by examining the empirical data of the series, especially if they are depicted by a histogram and a polygon. To make sure that the approximate definition of the distribution shape is correct, the empirical data of the series are examined for their closeness to the theoretical distribution, which is established by constructing the corresponding distribution curve. However, in many cases, neither theory nor direct consideration of empirical data provide answers to the question about the shape of the distribution. Then, a study is usually conducted on the closeness of empirical data to the normal distribution, since distributions with small or moderate asymmetry in most cases are normal by their type.

For an objective judgment about the degree of correspondence of the empirical distribution to the normal in statistics, a number of criteria are used, called the criteria of agreement or compliance.

These include the criteria of Pearson, Romanovsky, Yastremsky, Kolmogorov, based on the use of various theoretical concepts.

For example, the most commonly used Pearson's chi-square test is determined by the formula:

where are empirical frequencies (frequencies)

Theoretical frequencies (frequencies)

To assess the proximity of the empirical distribution to the theoretical, the probability of achieving this value by this criterion is determined. If this probability exceeds 0.05, then deviations of the actual frequencies from the theoretical ones are considered random, insignificant. If, however, the deviations are considered significant, and the empirical distribution is fundamentally different from the theoretical.

To characterize the degree of deviation of the symmetric distribution from the normal, the kurtosis index is calculated. It can be approximately determined using the Lindbergh coefficient.

where is the share (in%) of the number of variants lying in the interval equal to half the mean square deviation (in either direction from the mean value) in the total amount of a variant of this series;

38.29 - the share (in%) of the number of options lying in the interval equal to half mean square deviation (in one direction or another of the mean value) in the total amount of a variant of the normal distribution series

The kurtosis can be positive, negative or zero.

For high-vertex curves, the kurtosis index has a positive sign, for low-vertex curves, a negative sign. For a normal distribution curve, its value is zero.

For a more accurate characterization of the degree of deviation of the symmetric distribution from the normal one, the peakedness index (kurtosis index) (Ek) is calculated by the formula:

It, like the Lindbergh coefficient, can be positive, negative and equal to zero. The kurtosis indicator, like the asymmetry indicator, is an abstract number. The limiting value of negative kurtosis is Ek = -2; the magnitude of the positive kurtosis is infinite.

Determination of indicators of asymmetry and kurtosis has not only a descriptive meaning, often their values ​​give certain indications for further research of the phenomena under study. So, for example, the appearance of a significant negative kurtosis may indicate a qualitative heterogeneity of the studied population.

Modern computer technologies open up ample opportunities for performing cumbersome computational operations for the analysis of variational series. If the material is theoretically meaningful and a reasonable hypothesis about the shape of the distribution is put forward (the latter, by the way, computers are also able to check), computing devices can quickly calculate various generalizing indicators and criteria, build graphs, etc. This is all the more possible, since the indicators of variation are relatively simple and well formalized.

Variation concept

The average gives a generalizing characteristic of the entire set of the studied phenomenon.

Variation of the feature is called the difference in individual values ​​of a trait within the studied population.

The average value is an abstract, generalizing characteristic of the characteristic of the studied population, but it does not show the structure of the population.

The average value does not give an idea of ​​how the individual values ​​of the trait under study are grouped around the average, whether they are concentrated near or significantly deviate from it.

If the individual values ​​of the attribute are close to the arithmetic mean, then in this case the mean represents the entire population well. And vice versa.

The fluctuation of individual values ​​is characterized by indicators of variation.

The term "variation" comes from the Latin variatio - change, fluctuation, difference. However, not all differences are usually called variation.

Under variation in statistics, they mean such quantitative changes in the value of the trait under study within a homogeneous population, which are due to the intersecting influence of the action of various factors. Distinguish the variation of the trait in absolute and relative values. Absolute - R, L, σ, σ 2.

Variation indicators

1 aggregate 2 aggregate
n = 5 80, 100, 120, 200, 300 n = 8 145, 150, 155, 160, 160, 162, 168, 180

80 100 120 x 200 300

Therefore, in this case, it becomes necessary to determine the variation of the feature, i.e. the ratio of the individual values ​​of the series relative to each other.

Variation indicators

1. The range of variation is the difference between the maximum and minimum value of the feature.

R = X max - X min

R 1 = 300-80 = 220 R 2 = 180-145 = 35

Practice: for a homogeneous population, for product quality control.

2. Indicators that take into account the deviations of all options from the arithmetic mean.

a) Average linear deviation

b) Standard deviation

Average linear deviation is the arithmetic mean of the absolute values ​​of the deviations of individual options from the mean.

for not grouped:

;

for grouped:

Practice: it analyzes:

1. Composition of employees

2. Rhythm of production

3. Uniform supply of materials

Flaw: this indicator complicates the calculations of a probable type, makes it difficult to apply the methods of mathematical statistics

Mean square deviation (standard)- this is

for not grouped data

for grouped data

For moderately skewed distributions

The standard deviation, like the standard deviation, is an absolute indicator, expressed in the same units as the arithmetic mean.

The indicators of the root-mean-square or standard linear deviations for two populations turn out to be incomparable if the characteristics themselves for these populations are not the same. These indicators are not compared for different characteristics of the same population. Those. when the means in both populations are expressed in the same units of measurement and are the same, comparison is possible and will reflect the differences in the variation of the trait.

The standard deviation is a measure of the reliability of the mean. The smaller the σ, the better the arithmetic mean reflects the entire represented population.

3. Dispersion used to measure the variability of a feature. This indicator more objectively reflects the measure of variation

for not grouped

for grouped

A distinctive feature of this indicator is that when squared, the proportion of small deviations falls, and large ones increases in the total amount of deviations.

This is also an absolute indicator.

Variance has a number of properties, some of which make it easier to calculate:

1. The variance of the constant is 0

2. If all variants of the values ​​of the attribute (x) ↓ by the same number, then the variance does not decrease

3. If all options ↓ by the same number of times (K times), then the variance ↓ by K 2 times

x f x "

x 100 times

The variance σ is 0.909 * 10000 = 9090

The calculation of the indicators of variation for quantitative characteristics was considered above, but the task of assessing the variation can be posed qualitative features. For example, when studying the quality of manufactured products, it can be divided into good and defective.

In this case, we are talking about alternative features.

Variance of an alternative feature

Alternative signs are called those that some units of the aggregate have, while others do not. For example, the availability of work experience among applicants, an academic degree from university professors, etc. The presence of a feature in the units of the population is conventionally denoted by 1, and the absence - 0. x 1 = 1, x 2 = 0. The share of units that have a feature (in the total population) is denoted by p, and the share of units that do not have a feature - by q. Those. p + q = 1, q = 1-p.

Let's calculate the average value of the alternative feature

; ;

Those. the average value of an alternative feature is equal to the proportion of units with these characteristics, to the proportion of units that do not have these characteristics.

The standard deviation is equal to B p =

Quality is checked: 1000 finished products, 20 defective ones.

Find the marriage rate: (20/1000) * 100% = 0.02%

Dispersion has a number of properties, which simplify the calculation.

1. If from all values ​​the option subtract some constant number A, then the standard deviation from this will not change.

Share with friends or save for yourself:

Loading...