Variational series statistics. Variational series and its characteristics

Rows built quantitatively are called variational.

Distribution series consist of options(characteristic values) and frequencies(number of groups). Frequencies expressed as relative values ​​(fractions, percentages) are called frequent... The sum of all frequencies is called the volume of the distribution series.

By type, the distribution series are divided into discrete(built on the basis of discontinuous values ​​of the characteristic) and interval(built on continuous values ​​of the characteristic).

Variational series represents two columns (or lines); in one of which the individual values ​​of the variable attribute are given, referred to as options and denoted by X; and in the other - absolute numbers showing how many times (how often) each option occurs. The indicators of the second column are called frequencies and are conventionally denoted by f. Once again, we note that in the second column, relative indicators can also be used, characterizing the share of the frequency of individual variants in the total sum of frequencies. These relative indicators are called frequencies and are conventionally denoted through ω The sum of all the frequencies in this case is equal to one. However, frequencies can be expressed as a percentage, and then the sum of all frequencies gives 100%.

If the variants of the variation series are expressed as discrete quantities, then such a variation series is called discrete.

For continuous features, the variational series are constructed as interval, that is, the values ​​of the attribute in them are expressed "from ... to ...". At the same time, the minimum values ​​of the attribute in such an interval are called the lower boundary of the interval, and the maximum - the upper boundary.

Interval variation series are also constructed for discrete features varying in a large range. Interval rows can be with equal and unequal intervals.

Consider how the value of equal intervals is determined. Let us introduce the following notation:

i- the size of the interval;

- the maximum value of the attribute for the units of the population;

- the minimum value of the characteristic for the units of the population;

n - the number of allocated groups.

if n is known.

If the number of allocated groups is difficult to determine in advance, then the formula proposed by Sturgess in 1926 can be recommended for calculating the optimal value of the interval with a sufficient volume of the population:

n = 1+ 3.322 lg N, where N is the number of units in the aggregate.

The size of the unequal intervals is determined in each individual case, taking into account the characteristics of the object of study.

Statistical distribution of the sample call a list of options and their corresponding frequencies (or relative frequencies).

The statistical distribution of the sample can be set in the form of a table, in the first column of which the options are located, and in the second - the frequencies corresponding to these options ni, or relative frequencies Pi .

Statistical distribution of the sample

Variation series are called interval series, in which the values ​​of the characteristics underlying their formation are expressed within certain limits (intervals). Frequencies in this case refer not to individual characteristic values, but to the entire interval.

Interval distribution series are built according to continuous quantitative features, as well as discrete features varying within significant limits.

The interval series can be represented by the statistical distribution of the sample, indicating the intervals and the corresponding frequencies. In this case, the sum of the frequencies of the variant that fell into this interval is taken as the frequency of the interval.

When grouping by quantitative continuous characteristics, it is important to determine the size of the interval.

In addition to the sample mean and sample variance, other characteristics of the variation series are also used.

Fashion called the option that has the highest frequency.

RUSSIAN ACADEMY OF FOLK ECONOMY AND PUBLIC SERVICE under the PRESIDENT OF THE RUSSIAN FEDERATION

ORLOV BRANCH

Department of Mathematics and Mathematical Methods in Management

Independent work

Mathematics

on the topic "Variational series and its characteristics"

for students full-time department Faculty of Economics and Management

areas of training "Personnel Management"


Purpose of work: Mastering concepts mathematical statistics and methods of primary data processing.

An example of solving typical tasks.

Objective 1.

The following data were obtained by polling ():

1 2 3 2 2 4 3 3 5 1 0 2 4 3 2 2 3 3 1 3 2 4 2 4 3 3 3 2 0 6

3 3 1 1 2 3 1 4 3 1 7 4 3 4 2 3 2 3 3 1 4 3 1 4 5 3 4 2 4 5

3 6 4 1 3 2 4 1 3 1 0 0 4 6 4 7 4 1 3 5

Necessary:

1) Compile a variation series (statistical distribution of the sample), having previously recorded a ranked discrete series of options.

2) Construct a polygon of frequencies and cumulative.

3) Draw up a series of distributions of relative frequencies (frequencies).

4) Find the main numerical characteristics of the variation series (use simplified formulas to find them): a) the arithmetic mean, b) the median Me and fashion Moe, c) variance s 2, d) standard deviation s, e) coefficient of variation V.

5) Explain the meaning of the results obtained.

Solution.

1) To compose ranked discrete range of options sort the survey data by size and arrange it in ascending order

0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2

3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4

5 5 5 5 6 6 6 7 7.

Let's compose a variation series by writing in the first row of the table the observed values ​​(options), and in the second the corresponding frequencies (Table 1)

Table 1.

2) The frequency polygon is a broken line connecting the points ( x i; n i), i=1, 2,…, m, where m X.

Let's draw the polygon of the frequencies of the variation series (Fig. 1).

Fig. 1. Frequency polygon

The cumulative curve (cumulative) for a discrete variation series is a broken line connecting the points ( x i; n i nak), i=1, 2,…, m.

Find the accumulated frequencies n i nak(the cumulative frequency shows how many variants were observed with a feature value less NS). The found values ​​are entered into the third row of Table 1.



Let's build a cumulative (Fig. 2).

Fig. 2. Cumulata

3) Let's find the relative frequencies (frequencies), where, where m- number different meanings sign X, which will be calculated with the same accuracy.

Let us write down a series of distributions of relative frequencies (frequencies) in the form of table 2

table 2

4) Let's find the main numerical characteristics of the variation series:

a) We find the arithmetic mean using a simplified formula:

,

where are conditional options

We put with= 3 (one of the average observed values), k= 1 (the difference between two adjacent options) and draw up a calculation table (Table 3).

Table 3.

x i n i u i u i n i u i 2 n i
-3 -12
-2 -26
-1 -14
Sum -11

Then the arithmetic mean

b) Median Me a variation series is the value of a feature falling in the middle of a ranged series of observations. This discrete variation series contains an even number of terms ( n= 80), which means that the median is equal to the half-sum of the two median options.

Fashion Moe the variation series is the variant that corresponds to the highest frequency. For a given variation series, the highest frequency n max = 24 corresponds to variant NS= 3 means fashion Moe=3.

c) Dispersion s 2, which is a measure of the dispersion of possible values ​​of the indicator X around its mean, we find using a simplified formula:

, where u i- conditional options

We will also enter intermediate calculations in Table 3.

Then the variance

d) Standard deviation s find by the formula:

.

e) Coefficient of variation V: (),

The coefficient of variation is immeasurable, so it is suitable for comparing scattering variation series, variants of which have different dimensions.

The coefficient of variation

.

5) The meaning of the results obtained is that the value characterizes the average value of the feature X within the considered sample, that is, the average value was 2.86. Standard deviation s describes the absolute spread of the indicator values X and in this case is s≈ 1.55. The coefficient of variation V characterizes the relative variability of the indicator X, that is, the relative spread around its average value, and in this case is.

Answer: ; ; ; .

Objective 2.

The following data are available on the equity capital of the 40 largest banks in Central Russia:

12,0 49,4 22,4 39,3 90,5 15,2 75,0 73,0 62,3 25,2
70,4 50,3 72,0 71,6 43,7 68,3 28,3 44,9 86,6 61,0
41,0 70,9 27,3 22,9 88,6 42,5 41,9 55,0 56,9 68,1
120,8 52,4 42,0 119,3 49,6 110,6 54,5 99,3 111,5 26,1

Necessary:

1) Construct an interval variation series.

2) Calculate the mean sample and sample variance

3) Find the standard deviation, and the coefficient of variation.

4) Construct a histogram of distribution frequencies.

Solution.

1) Let's choose an arbitrary number of intervals, for example, 8. Then the width of the interval is:

.

Let's compose a calculation table:

Interval option, x k –x k +1 Frequency, n i Middle of interval x i Conditional option, and i and i n i and i 2 n i (and i + 1) 2 n i
10 – 25 17,5 – 3 – 12
25 – 40 32,5 – 2 – 10
40 – 55 47,5 – 1 – 11
55 – 70 62,5
70 – 85 77,5
85 – 100 92,5
100 – 115 107,5
115 – 130 122,5
Sum – 5

The value was selected as a false zero c = 62.5 (this option is located approximately in the middle of the variation row) .

Conditional options are determined by the formula

A group of numbers united by some sign is called aggregate.

As noted above, the primary statistical sports material is a group of scattered numbers that do not give the coach an idea of ​​the essence of a phenomenon or process. The challenge is to turn this collection into a system and use its indicators to obtain the required information.

The compilation of a variation series is precisely the formation of a certain mathematical

Example 2. 34 athletes-skiers recorded the following pulse recovery time after passing the distance (in seconds):

81; 78: 84; 90; 78; 74; 84; 85; 81; 84: 79; 84; 74; 84; 84;

85; 81; 84; 78: 81; 74; 84; 81; 84; 85; 81; 78; 81; 81; 84;

As you can see, this group of numbers does not carry any information.

To compile a variation series, first we perform the operation ranking - arranging numbers in ascending or descending order. For example, in ascending order, the ranking results in the following;

78; 78; 78; 78; 78; 78;

81; 81; 81; 81; 81; 81; 81; 81; 81;

84; 84; 84; 84; 84; 84; 84; 84; 84; 84; 84;

In descending order, the ranking results in a group of numbers like this:

84; 84; 84; 84; 84; 84; 84; 84: 84: 84; 84;

81; 81; 81; 81; 8!; 81: 81; 81; 81;

78; 78; 78; 78; 78; 78;

After the ranking, the irrational form of writing this group of numbers becomes obvious - the same numbers are repeated many times. Therefore, a natural thought arises to transform the record in such a way as to indicate which number is repeated how many times. For example, given the ranking in ascending order:

Here, on the left, there is a number indicating the recovery time of the athlete's pulse, on the right, the number of repetitions of this indication in this group of 34 athletes.

In accordance with the above concepts of mathematical symbols, the considered group of measurements will be denoted by some letter, for example x. Given the increasing order of numbers in this group: x 1 -74 s; x 2 - 78 s; x 3 - 81 s; x 4 - 84 s; x 5 - 85 s; x 6 -x n - 90 s, each considered number can be designated by the symbol X i.

Let us denote the number of repetitions of the considered measurements by the letter n. Then:

n 1 = 4; n 2 = 6; n 3 = 9; n 4 = 11; n 5 = 3; n 6 = n n = 1, and each number of repetitions can be denoted as n i.

The total number of measurements carried out, as follows from the condition of the example, is 34. This means that the sum of all n is 34. Or in symbolic terms:

Let's denote this sum by one letter - n. Then the initial data of the considered example can be written in this form (Table 1).

The resulting group of numbers is a transformed series of chaotically scattered readings obtained by the trainer at the beginning of work.

Table 1

x i n i
n = 34

Such a group represents a certain system, the parameters of which characterize the measurements carried out. The numbers representing the results of measurements (x i) are called options; n i - the number of their repetitions - are called frequencies; n - the sum of all frequencies - yes the volume of the population.

The entire resulting system is called variation series. These series are sometimes called empirical or statistical.

It is easy to see that a special case of a variation series is possible, when all frequencies are equal to one n i == 1, that is, each measurement in a given group of numbers occurs only once.

The resulting variation series, like any other, can be represented graphically. To plot the resulting series, you must first of all agree on the scale on the horizontal and vertical axes.

In this problem, on the horizontal axis, we will plot the values ​​of the pulse recovery time (x 1) in such a way that the unit of length, chosen arbitrarily, corresponds to the value of one second. We will start postponing these values ​​from 70 seconds, conventionally departing from the intersection of the two axes 0.

On the vertical axis, we postpone the values ​​of the frequencies of our series (n i), taking the scale: the unit of length is equal to the unit of frequency.

Having thus prepared the conditions for plotting the graph, we proceed to work with the obtained variation series.

The first pair of numbers x 1 = 74, n 1 = 4 is plotted on the graph as follows: on the x-axis; find x 1 =74 and restore the perpendicular from this point, find n 1 = 4 on the n-axis and draw a horizontal line from it until it intersects with the previously restored perpendicular. Both lines - vertical and horizontal - are auxiliary lines and therefore are applied to the drawing with a dotted line. The point of their intersection is, on the scale of this graph, the ratio X 1 = 74 and n 1 = 4.

All other points of the graph are plotted in the same way. Then they are connected by line segments. In order for the graph to have a closed form, extreme points connect by segments with adjacent points of the horizontal axis.

The resulting figure is a graph of our variation series (Fig. 1).

It is quite clear that each variation series is represented by its own graph.

Rice. 1. Graphic representation of the variation series.

In fig. 1 shows:

1) of all the surveyed, the largest group consisted of athletes, whose pulse recovery time was 84 s;

2) for many, this time is 81 s;

3) the smallest group consisted of athletes with a short pulse recovery time - 74 s and a long one - 90 s.

Thus, after performing a series of tests, the resulting numbers should be ranked and a variation series should be drawn up, representing a certain mathematical system... For clarity, the variation series can be illustrated with a graph.

The above variation series is also called discrete next - one in which each option is expressed by one number.

Here are some more examples of how to compose variation series.

Example 3. 12 shooters, performing a 10-shot prone exercise, showed the following results (with glasses):

94; 91; 96; 94; 94; 92; 91; 92; 91; 95; 94; 94.

To form a variation series, we will rank these numbers;

94; 94; 94; 94; 94;

After ranking, we compose a variation series (Table 3).

As a result of mastering this chapter, the student must: know

  • indicators of variation and their relationship;
  • basic laws of feature distribution;
  • the essence of the consent criteria; be able to
  • calculate indicators of variation and goodness-of-fit criteria;
  • define characteristics of distributions;
  • to evaluate the main numerical characteristics of statistical distribution series;

own

  • methods statistical analysis distribution rows;
  • the basics of analysis of variance;
  • methods of checking statistical distribution series for compliance with the basic distribution laws.

Variation indicators

At statistical research characteristics of various statistical aggregates of great interest is the study of the variation of the characteristic of individual statistical units of the population, as well as the nature of the distribution of units for this characteristic. Variation - these are differences in the individual values ​​of a trait in units of the studied population. The study of variation is of great practical significance... By the degree of variation, one can judge the boundaries of variation of a trait, the homogeneity of the population for this trait, the typicality of the average, the relationship of the factors that determine the variation. Variation indicators are used to characterize and order statistical populations.

Summary and grouping results statistical observation, designed in the form of statistical distribution series, represent an ordered distribution of units of the studied population into groups according to the grouping (varying) attribute. If a qualitative feature is taken as the basis for the grouping, then such a distribution series is called attributive(distribution by profession, gender, color, etc.). If a distribution series is built on a quantitative basis, then such a series is called variational(distribution by height, weight, size of wages, etc.). Constructing a variation series means ordering the quantitative distribution of population units according to the values ​​of the attribute, counting the number of population units with these values ​​(frequency), and placing the results in a table.

Instead of the frequency of the variant, it is possible to apply its relation to the total volume of observations, which is called the frequency (relative frequency).

There are two types of variation series: discrete and interval. Discrete series- it is a variation series based on features with discontinuous change (discrete features). The latter include the number of employees at the enterprise, the wage level, the number of children in the family, etc. A discrete variation series is a table that consists of two graphs. The first column indicates the specific value of the attribute, and the second - the number of units of the population with a specific value of the attribute. If the attribute has a continuous change (the amount of income, work experience, the cost of fixed assets of the enterprise, etc., which, within certain limits, can take any values), then for this attribute it is possible to construct interval variation series. When constructing an interval variation series, the table also has two columns. The first contains the value of the feature in the interval "from - to" (options), the second - the number of units included in the interval (frequency). Frequency (repetition rate) - the number of repetitions of a separate variant of the attribute values. The intervals can be closed and open. Closed intervals are limited on both sides, i.e. have a border both lower ("from") and upper ("to"). Open intervals have any one border: either upper or lower. If the options are arranged in ascending or descending order, then the rows are called ranked.

For variation series, there are two types of frequency response options: accumulated frequency and accumulated frequency. The accumulated frequency shows how many observations the value of the feature took on values ​​less than the specified one. The accumulated frequency is determined by summing the values ​​of the frequency of the attribute for this group with all the frequencies of the previous groups. The accumulated frequency characterizes the proportion of observation units for which the values ​​of the trait do not exceed the upper limit of the day group. Thus, the cumulative frequency shows the specific gravity of the variant in the aggregate, having a value not more than a given one. Frequency, frequency, absolute and relative density, accumulated frequency and frequency are characteristics of the magnitude of the variant.

Variations in the attribute of statistical units of the population, as well as the nature of the distribution, are studied using indicators and characteristics of the variation series, which include average level series, mean linear deviation, standard deviation, variance, coefficients of oscillation, variation, asymmetry, kurtosis, etc.

Average values ​​are used to characterize the center of distribution. The average is a generalizing statistical characteristic in which the typical level of the trait that the members of the studied population possess is quantified. However, cases of coincidence of arithmetic means are possible with a different nature of the distribution, therefore, as the statistical characteristics of the variation series, the so-called structural averages are calculated - mode, median, as well as quantiles that divide the distribution series into equal parts (quartiles, deciles, percentiles, etc.) ).

Fashion - this is the value of a characteristic that occurs in a distribution series more often than its other values. For discrete series, this is the option with the highest frequency. In interval variation series, in order to determine the mode, it is necessary to determine, first of all, the interval in which it is located, the so-called modal interval. In a variation series with equal intervals, the modal interval is determined by the highest frequency, in series with unequal intervals - but the highest distribution density. Then, to determine the mode in rows with equal intervals, use the formula

where Mo is the value of the mode; x Mo is the lower boundary of the modal interval; h - the width of the modal interval; / Mo is the frequency of the modal interval; / Mo j is the frequency of the pre-modal interval; / Mo + 1 is the frequency of the post-modal interval, and for a series with unequal intervals in this calculation formula instead of the frequencies / Mo, / Mo, / Mo, distribution densities should be used Mind 0 _| , Mind 0> UMo + "

If there is a single mode, then the probability distribution of the random variable is called unimodal; if there is more than one mode, it is called multimodal (polymodal, multimodal), in the case of two modes - bimodal. As a rule, multimodality indicates that the studied distribution does not obey the law normal distribution... For homogeneous populations, as a rule, unimodal distributions are characteristic. Multi-vertex also indicates the heterogeneity of the studied population. The appearance of two or more vertices makes it necessary to regroup the data in order to select more homogeneous groups.

In an interval variation series, the mode can be determined graphically using a histogram. For this, two intersecting lines are drawn from the top points of the highest column of the histogram to the top points of two adjacent columns. Then, from the point of their intersection, a perpendicular is lowered onto the abscissa axis. The value of the feature on the abscissa axis corresponding to the perpendicular is the mode. In many cases, when characterizing the population, fashion is preferred over the arithmetic mean as a generalized indicator.

Median - this is the central meaning of the feature; it is possessed by the central member of the ranked distribution series. In discrete series, in order to find the value of the median, its ordinal number is first determined. To do this, with an odd number of units, one is added to the sum of all frequencies, the number is divided by two. If the number of units is even, there will be two median units in the series, therefore, in this case, the median is determined as the average of the values ​​of the two median units. Thus, the median in a discrete variation series is the value that divides the series into two parts containing the same number of options.

In the interval series, after determining the ordinal number of the median, the medial interval is found by the accumulated frequencies (parts), and then, using the formula for calculating the median, the value of the median itself is determined:

where Me is the median value; x Me - lower border of the median interval; h - the width of the median interval; - the sum of the frequencies of the distribution series; / D - accumulated frequency of the pre-median interval; / Me is the frequency of the median interval.

The median can be found graphically using the cumulate. For this, on the scale of accumulated frequencies (frequencies) of the cumulates from the point corresponding to the ordinal number of the median, a straight line is drawn parallel to the abscissa axis until it intersects with the cumulate. Further, from the point of intersection of the specified straight line with the cumulative, a perpendicular is lowered onto the abscissa axis. The value of a feature on the abscissa axis corresponding to the drawn ordinate (perpendicular) is the median.

The median is characterized by the following properties.

  • 1. It does not depend on those values ​​of the characteristic that are located on either side of it.
  • 2. It has the property of minimality, which consists in the fact that the sum of the absolute deviations of the values ​​of the attribute from the median is the minimum value in comparison with the deviation of the values ​​of the attribute from any other value.
  • 3. When combining two distributions with known medians, it is impossible to predict in advance the value of the median of the new distribution.

These properties of the median are widely used in the design of the location of public service points - schools, clinics, gas stations, water standpipes, etc. For example, if it is planned to build a polyclinic in a certain quarter of the city, then it is more expedient to locate it at a point in the quarter that divides in half not the length of the quarter, but the number of inhabitants.

The ratio of the mode, the median and the arithmetic mean indicates the nature of the distribution of the attribute in the aggregate, allows you to evaluate the symmetry of the distribution. If x Me then there is a right-sided asymmetry of the row. With a normal distribution NS - Me - Mo.

K. Pearson, on the basis of equalizing various types of curves, determined that for moderately asymmetric distributions, the following approximate relations between the arithmetic mean, median and mode are valid:

where Me is the median value; Mo is the meaning of fashion; x arithm - the value of the arithmetic mean.

If it becomes necessary to study the structure of the variation series in more detail, then the values ​​of the feature are calculated, similar to the median. Such values ​​of the characteristic divide all distribution units into equal numbers, they are called quantiles or gradients. Quantiles are subdivided into quartiles, deciles, percentiles, etc.

Quartiles divide the population into four equal parts. The first quartile is calculated similarly to the median using the formula for calculating the first quartile, having previously determined the first quarterly interval:

where Qi is the value of the first quartile; x Q ^ - lower border of the first quartile interval; h- the width of the first quarterly interval; /, - frequencies of the interval series;

Accumulated frequency in the interval preceding the first quartile interval; Jq (is the frequency of the first quartile interval.

The first quartile shows that 25% of population units are less than its value, and 75% - more. The second quartile is equal to the median, i.e. Q 2 = Me.

By analogy, the third quartile is calculated, having previously found the third quarterly interval:

where is the lower border of the third quartile interval; h- the width of the third quartile interval; /, - frequencies of the interval series; / X "- accumulated frequency in the interval preceding

G

third quartile interval; Jq is the frequency of the third quartile interval.

The third quartile shows that 75% of population units are less than its value, and 25% - more.

The difference between the third and first quartiles is the interquartile range:

where Aq is the value of the interquartile range; Q 3 - the value of the third quartile; Q, is the value of the first quartile.

Deciles divide the population by 10 equal parts... A decile is such a value of a trait in a distribution series, which corresponds to tenths of the population size. By analogy with quartiles, the first decile shows that 10% of the population units are less than its value, and 90% - more, and the ninth decile reveals that 90% of the population units are less than its value, and 10% - more. The ratio of the ninth and first deciles, i.e. The decile coefficient is widely used in the study of income differentiation to measure the ratio of the income levels of the 10% richest and 10% of the poorest population. The percentiles divide the ranked population into 100 equal parts. The calculation, meaning and application of percentiles are similar to deciles.

Quartiles, deciles and other structural characteristics can be determined graphically by analogy with the median using cumulates.

The following indicators are used to measure the size of the variation: range of variation, mean linear deviation, standard deviation, variance. The magnitude of the variation range entirely depends on the randomness of the distribution of the extreme terms of the series. This indicator is of interest in cases where it is important to know what is the amplitude of fluctuations in the values ​​of the attribute:

where R - the value of the range of variation; x max is the maximum value of the feature; x tt - the minimum value of the feature.

When calculating the range of variation, the value of the overwhelming majority of the members of the series is not taken into account, while the variation is associated with each value of the member of the series. This drawback is devoid of indicators, which are averages obtained from the deviations of individual values ​​of a trait from their mean: the average linear deviation and the standard deviation. There is a direct relationship between individual deviations from the average and the variability of a particular trait. The stronger the fluctuation, the greater the absolute size of the deviations from the average.

The average linear deviation is the arithmetic mean of the absolute values ​​of deviations of individual options from their mean.

Average linear deviation for ungrouped data

where / pr is the value of the average linear deviation; x, - is the value of the feature; NS - NS - the number of units in the population.

Average linear deviation of the grouped series

where / vz - the value of the average linear deviation; x, is the value of the feature; NS - the average value of the trait for the studied population; / is the number of population units in a separate group.

In this case, the signs of deviations are ignored, otherwise the sum of all deviations will be equal to zero. The average linear deviation, depending on the grouping of the analyzed data, is calculated using various formulas: for grouped and non-aggregated data. The average linear deviation, due to its conventionality, separately from other indicators of variation, is used in practice relatively rarely (in particular, to characterize the fulfillment of contractual obligations in terms of uniformity of delivery; in the analysis of foreign trade turnover, the composition of employees, the rhythm of production, product quality, taking into account the technological features of production and etc.).

The standard deviation characterizes how much, on average, the individual values ​​of the trait under study deviate from the average value for the population, and is expressed in units of measurement of the trait under study. The standard deviation, being one of the main measures of variation, is widely used in assessing the boundaries of variation of a trait in a homogeneous population, in determining the values ​​of the ordinates of the normal distribution curve, as well as in calculations related to organizing sample observation and establishing the accuracy of sample characteristics. The root-mean-square deviation of non-coarse data is calculated according to the following algorithm: each deviation from the mean is squared, all the squares are summed, after which the sum of the squares is divided by the number of members of the series and the square root is extracted from the quotient:

where a Iip is the value of the standard deviation; Xj - the value of the feature; NS- the average value of the trait for the studied population; NS - the number of units in the population.

For grouped analyzed data, the standard deviation of the data is calculated using the weighted formula

where - the value of the standard deviation; Xj - the value of the feature; NS - the average value of the trait for the studied population; f x - the number of population units in a particular group.

The expression under the root in both cases is called variance. Thus, the variance is calculated as the mean square of the deviations of the feature values ​​from their mean. For unweighted (simple) values ​​of the characteristic, the variance is determined as follows:

For weighted characteristic values

There is also a special simplified way of calculating variance: in general form

for unweighted (simple) characteristic values for weighted characteristic values
using the conditional zero counting method

where a 2 is the value of the variance; x, - is the value of the feature; NS - average value of a feature, h - group interval value, t 1 - weight (A =

Dispersion has an independent expression in statistics and is one of the most important indicators of variation. It is measured in units corresponding to the square of the units of measurement of the trait under study.

The dispersion has the following properties.

  • 1. The variance of the constant is zero.
  • 2. A decrease in all values ​​of a feature by the same value A does not change the magnitude of the variance. This means that the mean square of deviations can be calculated not by the given values ​​of the attribute, but by their deviations from some constant number.
  • 3. Decrease in all values ​​of the attribute in k times reduces the variance by k 2 times, and the standard deviation - in k times, i.e. all the values ​​of the attribute can be divided by some constant number (say, by the value of the interval of the series), calculate the standard deviation, and then multiply it by a constant number.
  • 4. If you calculate the mean square of deviations from any value And at differing to some extent from the arithmetic mean, then it will always be greater than the mean square of deviations calculated from the arithmetic mean. In this case, the mean square of deviations will be larger by a quite definite amount - by the square of the difference between the mean and this conventionally taken value.

A variation of an alternative attribute consists in the presence or absence of the studied property in the units of the population. Quantitatively, the variation of an alternative feature is expressed in two values: the presence of the studied property in a unit is indicated by a unit (1), and its absence by a zero (0). The fraction of units that have the property under study is denoted by P, and the fraction of units that do not have this property is denoted by G. Thus, the variance of an alternative feature is equal to the product of the fraction of units with this property (P) by the fraction of units that do not have this property. (G). The greatest variation in the population is achieved in cases when a part of the population, which is 50% of the total volume of the population, has a feature, and the other part of the population, also equal to 50%, does not have this feature, while the variance reaches a maximum value of 0.25, i.e. .e. P = 0.5, G = 1 - P = 1 - 0.5 = 0.5 and o 2 = 0.5 0.5 = 0.25. The lower bound of this indicator is zero, which corresponds to a situation in which there is no aggregate variation. Practical use the variance of an alternative feature consists in constructing confidence intervals when conducting a sample observation.

How less value variance and standard deviation, the more homogeneous the population and the more typical the mean will be. In the practice of statistics, it is often necessary to compare the variations of various features. For example, it is interesting to compare variations in the age of workers and their qualifications, length of service and wages, cost and profit, length of service and labor productivity, etc. For such comparisons, the indicators of the absolute variability of characteristics are unsuitable: it is impossible to compare the variability of the length of service, expressed in years, with the variation in wages, expressed in rubles. To carry out such comparisons, as well as comparisons of the fluctuations of the same feature in several populations with different arithmetic means, variation indicators are used - the oscillation coefficient, linear coefficient variation and coefficient of variation, which show the measure of fluctuation of extreme values ​​around the mean.

Oscillation coefficient:

where V R - value of the oscillation coefficient; R- the value of the range of variation; NS -

Linear coefficient of variation ".

where Vj - the value of the linear coefficient of variation; I - the value of the mean linear deviation; NS - the average value of the trait for the studied population.

The coefficient of variation:

where V a - the value of the coefficient of variation; a - the value of the standard deviation; NS - the average value of the trait for the studied population.

The oscillation coefficient is the percentage of the range of variation to the mean value of the trait under study, and the linear coefficient of variation is the ratio of the average linear deviation to the mean value of the trait under study, expressed as a percentage. The coefficient of variation is the percentage of the standard deviation to the mean of the trait being studied. As a relative value, expressed as a percentage, the coefficient of variation is used to compare the degree of variation of various features. The coefficient of variation is used to estimate the homogeneity of the statistical population. If the coefficient of variation is less than 33%, then the studied population is homogeneous, and the variation is weak. If the coefficient of variation is more than 33%, then the studied population is heterogeneous, the variation is strong, and the average value is atypical and cannot be used as a generalizing indicator of this population. In addition, the coefficients of variation are used to compare the variability of one trait in different populations. For example, to assess the variation in the length of service of employees at two enterprises. The higher the value of the coefficient, the more significant the variation of the feature.

Based on the calculated quartiles, it is also possible to calculate the relative indicator of quarterly variation using the formula

where Q 2 and

The interquartile range is determined by the formula

Quartile bias is used in place of the range to avoid the disadvantages of using extreme values:

For unequally interval variation series, the distribution density is also calculated. It is defined as the quotient of dividing the corresponding frequency or frequency by the value of the interval. In unequally spaced series, absolute and relative distribution densities are used. The absolute density of the distribution is the frequency per unit length of the interval. The relative density of distribution is the frequency per unit length of the interval.

All of the above is true for distribution series, the distribution law of which is well described by the normal distribution law or is close to it.

An example of solving a test on mathematical statistics

Problem 1

Initial data : students of a certain group of 30 people passed the exam in the course "Informatics". The grades received by the students form the following series of numbers:

I. Let's compose a variation series

m x

w x

m x nak

w x nak

Total:

II. Graphical presentation of statistical information.

III. Numerical characteristics of the sample.

1. Arithmetic mean

2. Geometric mean

3. Fashion

4. Median

222222333333333 | 3 34444444445555

5. Sample variance

7. Coefficient of variation

8. Asymmetry

9. Asymmetry coefficient

10. Excess

11. The coefficient of kurtosis

Task 2

Initial data : students of a certain group wrote their final test. The group consists of 30 people. The points scored by the students form the following series of numbers

Solution

I. Since the feature takes on many different values, we will construct an interval variation series for it. To do this, first set the value of the interval h... We will use Stairger's formula

Let's compose a scale of intervals. In this case, we will take the value determined by the formula for the upper boundary of the first interval:

The upper bounds of the subsequent intervals are determined by the following recursive formula:

, then

We finish building the scale of intervals, since the upper limit of the next interval has become greater than or equal to the maximum value of the sample
.

II. Graphical display of interval variation series

III. Numerical characteristics of the sample

To determine the numerical characteristics of the sample, we will compose an auxiliary table

Sum:

1. Arithmetic mean

2. Geometric mean

3. Fashion

4. Median

10 11 12 12 13 13 13 13 14 14 14 14 15 15 15 |15 15 15 16 16 16 16 16 17 17 18 19 19 20 20

5. Sample variance

6. Sample standard deviation

7. Coefficient of variation

8. Asymmetry

9. Asymmetry coefficient

10. Excess

11. The coefficient of kurtosis

Problem 3

Condition : the scale division of the ammeter is 0.1 A. Readings are rounded to the nearest whole division. Find the probability that an error exceeding 0.02 A.

Solution.

The rounding error can be considered as a random value. NS, which is distributed evenly in the interval between two adjacent integer divisions. Density of uniform distribution

,

where
- the length of the interval in which the possible values ​​are enclosed NS; outside this interval
In this problem, the length of the interval containing the possible values NS, is equal to 0.1, therefore

The counting error will exceed 0.02 if it is enclosed in the interval (0.02; 0.08). Then

Answer: R=0,6

Problem 4

Initial data: mathematical expectation and standard deviation of a normally distributed feature NS are, respectively, 10 and 2. Find the probability that in the test result NS will take the value enclosed in the interval (12, 14).

Solution.

Let's use the formula

And theoretical frequencies

Solution

For X her expected value M (X) and variance D (X). Solution... Let us find the distribution function F (x) of a random variable ... sampling error). Let's compose variational row Span Width will be: For each value a number of let's calculate how many ...

  • Solution: Separable Equation

    Solution

    In the form To find the private solutions inhomogeneous equation make up system Let's solve the resulting system ...; +47; +61; +10; -eight. Construct Interval variational row... Give statistical estimates of the mean ...

  • Solution: Let's calculate chain and basic absolute increments, growth rates, growth rates. The obtained values ​​are summarized in table 1

    Solution

    Production volume. Solution: Arithmetic mean of interval variational a number of is calculated as follows: for ... The marginal sampling error with a probability of 0.954 (t = 2) will be: Δ w = t * μ = 2 * 0.0146 = 0.02927 Determine the boundaries ...

  • Solution. Sign

    Solution

    O work experience which and made up sample. The sample average length of time ... of the working day of these employees and made up sample. The average duration for the sample is ... 1.16, the significance level is α = 0.05. Solution. Variational row of this sample has the form: 0.71 ...

  • Working curriculum in biology for grades 10-11 Compiled by S. Polikarpova

    Working training program

    The simplest crossing schemes "5 LR. " Solution elementary genetic challenges»6 L.r. " Solution elementary genetic problems ”7 L. r. “..., 110, 115, 112, 110. Make up variational row, draw variational curve, find the average value of the feature ...

  • Share with friends or save for yourself:

    Loading...