Bootstrem, small samples, application in data analysis. Types of samples

In the process of evaluating the degree of representativeness of these selective observation, the question of the volume of the sample aggregate becomes important. Sampling Recalculation Coefficient Student

Not only the magnitude of the limits, which, with this probability, does not exceed the sampling error, but also methods for determining these limits.

With a large number of units of selective aggregate () Distribution of random errors of the sample medium in accordance with theorem Lyapunova Normally or approaches normal as the number of observations increases.

The probability of error output for certain limits is estimated on the basis of tables laplas integral . The sample error calculation is based on the size of the general dispersion, since with a large coefficient, which is multiplied by selective variance to obtain the general, the big role does not play.

In the practice of statistical examination, it is often necessary to deal with small so-called small samples.

Under a small sample means such a selective observation, the number of units of which does not exceed 30.

The development of the theory of small sample was started by English statistics V.S. Gosset (printed under the pseudonym Student ) In 1908, he proved that the evaluation of the discrepancy between the middle sample and the general average has a special distribution law.

To determine the possible limits of error use the so-called criteria Student, defined by formula

where is the measure of random oscillations of selective medium in

small sample.

The value is calculated on the basis of selective observation data:

This value is used only for the total totality, and not as an approximate assessment in the general population.

With a small number of sampling distribution Student It differs from normal: large values \u200b\u200bof the criterion have a greater probability here than with normal distribution.

Limit error of a small sample depending on the average error is represented as

But in this case, the value is otherwise associated with a likely estimate than with a large sample.

According to the distribution Student This probable assessment depends both from the size and on the size of the sample if the limit error does not exceed the average error in small samples.

Table 3.1. Probability distribution in small samples depending from the trust coefficient and sampling


As can be seen table. 3.1. With an increase in this distribution, it seeks normal and with already little different from it.

We show how to use Student's distribution table.

Suppose that the sample survey of the workers of the small enterprise has shown that the workers spent on the implementation of one of the production operations (min.). Find selective average costs:

Selective dispersion

Hence the average mistake of a small sample

By table. 3.1. We find that for the trust coefficient and the volume of small sample, the probability is equal.

Thus, with a probability, it can be argued that the discrepancy between the sample and the general average lies in the range from to, i.e. The difference will not exceed the absolute value ().

Consequently, the average time spent in the whole of the totality will be from up to.

The likelihood that this assumption is actually incorrect and the error by random reasons will be greater than, equal to :.

Table of probabilities Student is often given in other uniforms than in tab 32.1. . It is believed that in some cases such a form is more convenient for practical use ( table. 3.2. ).

Of table. 3.2. It follows that for each number of degrees of freedom, the limit value is indicated, which with this probability will not be exceeded by casual oscillations of the sample results.

Based on specified in table. 3.2. The values \u200b\u200bare determined trust intervals : and.

This is an area of \u200b\u200bthose values \u200b\u200bof the general average, the exit beyond which has a very low probability equal to:

As a trustful probability, the bilateral test is used as a rule, or that does not exclude, however, the choice and other not given in table. 3.2. .

Table 3.2. Some meanings Trantending Student

The probability of accidental output of the estimated average value beyond the limits of the confidence interval will be equal and, i.e. Very small.

The choice between probabilities is to a certain extent arbitrary. This choice is largely determined by the content of those tasks to solve a small sample.

In conclusion, we note that the calculation of errors in a small sample differs little from similar calculations of a large sample. The difference is that with a small sample, the probability of our statement is somewhat less than with more sample (in particular, in the example above and, accordingly).

However, all this does not mean that you can use a small sample when you need a big sample. In many cases, discrepancies between foundities found can reach significant sizes, which is hardly satisfied by the researchers. Therefore, a small sample should be applied in a statistical study of socio-economic phenomena with great care, with the appropriate theoretical and practical justification.

So, the findings based on the results of small samples are practical only under the condition that the distribution of the trait in the general population is normal or asymptotically normal. It is also necessary to take into account the accuracy of the sampling results of a small volume is still lower than with a large sample.

In practice, it is quite often necessary to deal with the samples of a very small volume, the number of which is significantly less than twenty - thirty. Such samples in statistics were called small samples. The need for a special consideration of small samples is caused by the fact that the methods disassembled above the methods of point and interval evaluation of the sample characteristics suggest a sufficiently large number of samples.

The concept of small samples. Student distribution

Selective average and, accordingly, its error is distributed normally, and the correction by the amount of selection dispersion is very close to one and does not have a practical value. The sampling error under these conditions very rarely exceeds the value. Other dealing with a small sampling. With small samples, the selective dispersion is significantly displaced. Therefore, it would be wrong to apply the function of normal distribution for probabilistic conclusions about the possible value of the error. With a small sampling, it is always necessary to use an uncomplicated dispersion assessment:

Consequently, to obtain an unbelievable evaluation of the dispersion according to the minor sample, the sum of the squares of deviations must be divided by the amount. This value is called the number of degrees of freedom of variation. In the future, for brevity, the number of degrees of freedom of variation will be denoted by the Greek letter (NU).

The problem of assessing selective characteristics based on small samples was first investigated by the British mathematician of statistics V. Gosset, who published his works under the pseudonyms of Student (1908).

Based on the proposal for the normality of the distribution of the trait in the general population and considering instead of absolute deviations of their relationship to the independent standard, Student found a distribution that depends only on the number of sample. Later (1925) R. Fisher gave a more severe proof of this distribution, which was called Student's distribution.

Student is expressed as the following attitude:

The expression numerator appears the variable value that reflects the possible values \u200b\u200bof the deviations of the sample medium from the general average. The value is distributed normally with a center equal to zero, and a dispersion equal to.

It should be especially emphasized that the expression denominator cannot be considered as an average error of the variable. The value is considered here as an independently distributed variable from the numerator. Means the average quadratic (standard) deviation of this sample and is not an estimate of the general population, since the distribution of styudent does not depend on the general population parameter. Determined by sampling as

Distributions are independent of each other. Only under this condition and for samples from normal aggregates, the distribution of Student takes place.

The main advantage of Student's distribution is that it does not depend on the parameters of the general population and is dealt only with the values \u200b\u200bobtained directly from the sample.

Differential law Student distribution (probability density) has the form:

where the size of the sample;

the value corresponding to the maximum ordinate of the distribution curve at T \u003d 0.

Accordingly, Student's distribution function is expressed:

In other words,

where T f standardized (normalized) difference calculated by the results of a small sample.

The values \u200b\u200bof g () and g () are gamma functions. For a certain number of gamma - the function is expressed by an incompatible integral:

In small samples, there is always a whole positive number (sampling volume).

In this case, Gamma - the function always has a finite value and is expressed through factorily:

hence:

When calculating Gamma - functions useful to know the following properties:

1) when there is;

  • 3) for example,

Using this property, it is easy to calculate the values \u200b\u200bof g () and g () in the expression of the distribution density;

4) The function reaches a minimum during fractional value.

Figure 3.1

The general type of gamma - function is shown in Fig. 3.1.

From the properties of Student's distribution, considered usually in the course of probability theory, attention is drawn to the following:

1) Student's distribution is great because it depends only on one parameter - the sampling volume and does not depend on the average and dispersion of the general population (in contrast to the normal distribution depending on these two parameters).

  • 2) Student's distribution is precisely for any sample size, and for small samples, which allows you to make probabilistic conclusions for a small number of observations.
  • 3) With an increase in the size of the sample, the value approaches the value, and the distribution of styudent approaches normal. When the distribution of Student becomes normal. Almost for normal approximation is considered sufficient.

Figure 3.2.

In fig. 3.2 shows the ratio between the distribution of Student and the normal distribution.

As can be seen from fig. 3.2, under the ends of the styudent distribution curve, for example, or, is a significantly large part of the area than under the curve of the normal distribution with the same values. This means that with a small sample volume, the probability of making big errors increases significantly. It can be seen from the figure that at the values \u200b\u200bof the normalized deviation exceeding the absolute value, the area under the styudent distribution curve is much larger than under the normal distribution curve.

On the magnitude of the discrepancies between the values \u200b\u200bof the Student distribution function, depending on the size of the sample, and the values \u200b\u200bof the normal distribution function, can be judged according to Table. 3.2, where the values \u200b\u200bof the area under the distribution curve from different sample numbers are given.

Table 3.1.

The value of the normal distribution function

Table 3.2.

Probability values \u200b\u200bwith different sampling

Normated deviation

Meaning with small samples with numbers

Value at large samples

From Table 3.2. It can be seen that with an increase in the sample size, a small sample quickly approaches normal. At the same time, with a very small number of discrepancies between the values \u200b\u200bat this value, it is very significant.

Studies found that Student's distribution is almost applicable not only in the case of a normal distribution of the trait in the general population. It turned out that it occurs to practically acceptable conclusions and then when the distribution of the trait in the general population is not normal, but only symmetrically and even a few asymmetrically, but the size of the sample is not too small.

The values \u200b\u200bof the Student distribution function are taped at different values \u200b\u200btherefore, when evaluating selective characteristics, use ready-made tables:

Table 3.3.

Table of function values

The values \u200b\u200bof the Student distribution function can be used in various ways, depending on the nature of the solid tasks, when determining the probability of deviation of the sample from the general. The most commonly used:

1) Determining the likelihood that the difference between selective medium and general medium will be less than a certain value. In normalized deviations, the task is reduced to determine the likelihood that there is less than the value specified by the conditions of the problem, i.e. To finding meaning

Figure 3.3.

This is the likelihood of large negative deviations, which is in fig. 3.3 corresponds to the shaded area.

2) determination of the likelihood that the difference between selective medium and middle general will be at least some given value, in other words, it should be found

Figure 3.4.

This is the likelihood of large positive deviations, which is shown in the form of a shaded area in Fig. 3.4. This probability is easy to find using tables.

3) determining the likelihood that the normalized deviation in absolute value will be less expressed

This is the likelihood of the abolit value of deviations. This probability can be determined using tables. Since in practice most often has to determine this probability drawn up a special value table (Table 3.3).

Graphic illustration of the probability of smaller in the absolute value of deviations is given in Fig. 3.5

Figure 3.5

4) determination of the likelihood that the sample error in absolute value will be at least some given value. In normalized units, the likelihood that in absolute value will be no less, will express

This is the likelihood of large deviations in the absolute value. Graphically it is illustrated in fig. 3.6.

Figure 3.6.

To find the probability of large in the absolute value of deviations there are special tables (Appendix 3). This probability can be easily calculated, also using tables.

  • 6. Types of statistical groups, their informative importance.
  • 7.State tables: types, construct rules, reading
  • 8. Absolute values: species, cognitive importance. The conditions for the scientific use of absolute and relative indicators.
  • 9. Middle values: content, types, types, scientific conditions.
  • 11. Make-up dispersion. The rule of addition (decomposition) of dispersion and its use in statistical analysis.
  • 12.Vids statistical graphs on the content of solved tasks and construct methods.
  • 13. Rows of speakers: species, analysis indicators.
  • 14. Methods of identifying trends in dynamic ranks.
  • 15. Indices: Definition, basic elements of indexes, tasks solved using indexes, index system in statistics.
  • 16. Rules for building dynamic and territorial indices.
  • 17. Basics of the selective method theory.
  • 18. The theory of small samples.
  • 19. Methods for selecting units into a selective set.
  • 20. Vida connections, statistical methods for analyzing relationships, the concept of correlation.
  • 21. Suitoring correlation analysis, correlation models.
  • 22. Proceeds (tightness) of correlation.
  • 23. The system of indicators of socio-economic statistics.
  • 24. Basic groupings and classifications in socio-economic statistics.
  • 25. National Wealth: Category and Composition.
  • 26. Land inventory content. Indicators of the composition of land for the form of ownership, intended purpose and types of land.
  • 27. Classification of fixed assets, methods of assessment and revaluation, indicators of motion, state and use.
  • 28. Objectives of labor statistics. The concept and content of the main categories of the labor market.
  • 29. Statistics use of labor and working time.
  • 30. Indicators of labor productivity and analysis methods.
  • 31. Indicators of production of crop production and yields S.-H. Cultures and land.
  • 32. Pathelors of production of animal husbandry and productivity of farm animals.
  • 33.Statism of public costs and production costs.
  • 34.Statism of labor and labor costs.
  • 35.stability of gross products and income.
  • 36. Movements of movement and sale of agricultural products.
  • 37. Suppresses of statistical analysis of agricultural enterprises.
  • 38.Statism of prices and goods of the sectors of the national economy: tasks and methods of analysis.
  • 39.stability of the goods and services market.
  • 40.Statism of social production indicators.
  • 41.Stiver analysis of prices of the consumer market.
  • 42.Statism inflation and the main indicators of its assessment.
  • 43.Fast statistics of finance of enterprises.
  • 44. The main indicators of the financial results of enterprises.
  • 45.Fast statistics of the state budget.
  • 46. \u200b\u200bThe system of indicators of state budget statistics.
  • 47. The system of indicators of monetary statistics.
  • 48. The statistics of the composition and structure of the money supply in the country.
  • 49. The main tasks of banknotes.
  • 50.The main indicators of banking statistics.
  • 51.Things and classification of the loan. The tasks of its statistical study.
  • 52. The system of indicators of credit statistics.
  • 53. The main indicators and methods for analyzing a savings business.
  • 54. Possesses of the statistics of the stock market and securities.
  • 56.Statism of commodity exchanges: tasks and system of indicators.
  • 57. System of national accounts: concepts, main categories and classification.
  • 58. The main principles of the construction of the SNA.
  • 59.The main macroeconomic indicators are the content, methods of determination.
  • 60. Menstive balance: concepts, tasks, mob species.
  • 62.station of income and population costs
  • 18. The theory of small samples.

    With a large number of units of selective aggregate (N\u003e 100), the distribution of random errors of the sample medium in accordance with the AM Toripunov Theorem is normal or approaches normal as the number of observations increases.

    However, in the practice of statistical research in the conditions of a market economy, it becomes increasingly faced with small samples.

    A small sample is called such selective observation, the number of units of which does not exceed 30.

    When evaluating the results of a small sample, the value of the general population is not used. To determine the possible error limits use Student's criterion.

    The value of σ is calculated on the basis of selective observation data.

    This value is used only for the taluity under study, and not as an approximate estimate σ in the general population.

    A probabilistic assessment of the results of a small sample differs from the estimate in a large sample in that with a small number of observations, the probability distribution for the average depends on the number of selected units.

    However, for a small sample, the value of the trust coefficient T is otherwise associated with a probabilistic assessment than with a large sample (since, the distribution law differs from normal).

    According to the distribution, the probable distribution error, the probable distribution error depends both on the value of the trust coefficient t and on the volume of sampling V.

    The average minor sample error is calculated by the formula:

    where - the dispersion of a small sample.

    In MV, the N / (N-1) coefficient must be taken into account and be sure to adjust. When determining the dispersion S2, the number of degrees of freedom is equal:

    .

    Limit error of small sampling is determined by the formula

    At the same time, the value of the trust coefficient T depends not only on a given confidence probability, but also on the number of sample units N. For individual values \u200b\u200bT and n, the confidence probability of small sample is determined by special tables of Student, in which the distribution of standardized deviations are given:

    Probabilistic assessment of the results of MV differs from the assessment in the BV in the fact that with a small number of observations, the probability distribution for the average depends on the number of selected units

    19. Methods for selecting units into a selective set.

    1. The selective set must be large enough.

    2. The structure of the selective aggregate should best reflect the structure of the partial aggregate

    3. The selection method must be random

    Depending on whether selected units are involved in the sample, the method is distinguished - imaging and repeated.

    Capture is called such a selection in which the unit that fell into the sample does not return to the aggregate, from which further selection is carried out.

    Calculation of the average error of the non-accidental random sample:

    Calculation of the limit error of the non-accident random sample:

    With a re-selection, the unit that fell into the sample after registration of the observed features is returned to the initial (general) set for participation in the further selection procedure.

    The calculation of the average error of the re-easy random sample is as follows:

    Calculation of the limit error re-random sampling:

    The formation of the sample aggregate is divided into - individual, group and combined.

    The selection method - determines the specific mechanism of sampling of units from the general population and is divided into: actually - random; mechanical; typical; serial; combined.

    Actually - random The most common method of selection in a random sample, it is also called the draw method, with it for each unit of the statistical set, a ticket with a sequence number is blank. Further, in random order, the required number of statistical aggregate units is selected. Under these conditions, each of them has the same probability to get into the sample.

    Mechanical sample. It is applied in cases where the general set in any way is ordered that is e. There is a certain sequence in the location of units.

    To determine the average error of the mechanical sample, the middle error formula is used at a random non-random non-election.

    Typical selection. It is used when all units of the general population can be divided into several typical groups. Typical selection implies a sample of units from each group in itself by random or mechanically.

    For the typical sampling, the value of the standard error depends on the accuracy of the definition of group averages. So, in the formula of the ultimate error of the typical sample, the average of group dispersions is taken into account, i.e.

    Serial selection. It is applied in cases where the units of the aggregate are combined into small groups or series. The essence of the serial sample is in itself a random or mechanical selection of the series, within which a continuous survey of units is performed.

    In a serial sample, the sample error value depends not on the number of units under study, and on the number of surveyed series (S) and on the intergroup dispersion value:

    Combined selection Can pass one or more steps. The sample is called a single-stage, if the units of the totality are selected once exposed.

    Sample is called multistageIf the selection of the aggregate passes along steps, serial stages, and each stage, the selection stage has its own unit of selection.

    "

    In the practice of statistical research often have to deal with small samples which have a volume of less than 30 units. Large usually include samples of more than 100 units.

    Usually small samples are applied in cases where it is impossible or inappropriate to use a large sample. It is necessary to deal with such samples, for example, in surveys of tourists and visitors of hotels.

    The magnitude of the minor sample error is determined by formulas that differ from the formulas for a relatively large sample size ().

    With small sampling n. The relationship between selective and general dispersion should be taken into account:

    Since with a small sample, the fraction is essential, the calculation of the dispersion is made taking into account the so-called the number of degrees of freedom . It is understood as the number of options that can take arbitrary values \u200b\u200bwithout changing the values \u200b\u200bof the average.

    The average error of a small sample is determined by the formula:

    The selection limit error for medium and share is similar to the case of a large sample:

    where T is the confidence coefficient, depending on the specified level of significance and the number of degrees of freedom (Appendix 5).

    The values \u200b\u200bof the coefficient depend not only on the given trust probability, but also on the sample size n.. For individual values \u200b\u200bT and N, the confidence probability is determined by the distribution of Student, which contains the distribution of standardized deviations:

    Comment.As the sample increases, the distribution of Student is approaching normal distribution: when n.\u003d 20 It is no longer different from the normal distribution. When carrying out small sample surveys, it should be noted that the smaller the size of the sample N.The greater the difference between the distribution of Student and the normal distribution. For example, as p MIN. \u003d.4 This distinction is very significant, which indicates the reduction of the accuracy of the results of a small sample.

    The distribution of sample characteristics on the general population, based on the action of the law of large numbers, suggests a sufficiently large sampling. However, in the practice of statistical research, it is often necessary to deal with the inability for one reason or another to increase the number of sampling units having a small volume. This applies to the study of the activities of enterprises, educational institutions, commercial banks, etc., the number of which in the regions, as a rule, is slightly, and sometimes it is only 5-10 units.

    In the case when the selective aggregate consists of a small number of units, less than 30, the sample is called small. In this case, to calculate the sample error, it is impossible to use the Lyapunov theorem, since the magnitude of each of the randomly selected units and its distribution can be significantly different from normal.

    In 1908, V.S. Gosset proved that the evaluation of the discrepancy between the selective middle sample and the general average has a special distribution law (see chapter 4). Taking the problem of a probabilistic evaluation of selective medium with a small number of observations, it showed that in this case it is necessary to consider the distribution of the non-selective averages themselves, and the values \u200b\u200bof their deviations from the average source set. In this case, the conclusions can be quite reliable.

    The opening of Student is called the theory of small samples.

    When evaluating the results of a small sample, the magnitude of the general dispersion in the calculations is not used. In small samples, a "corrected" sample dispersion is used to calculate the average sample error:

    those. In contrast to large samples in the denominator instead p worth (and - 1). The calculation of the average sample error for a small sample is given in Table. 5.7.

    Table 5.7.

    Calculation of an average mistake of a small sample

    Limit error of a small sample is equal to: where t. - Trust coefficient.

    Value t. Otherwise, it is associated with a likely estimate than with a large sample. In accordance with the distribution of Student, the likely an assessment depends on the size of t, And on the size of the sample, in case the limit error does not exceed a M-multiple middle error in small samples. However, it depends on the number of selected units.

    V.S. Gosset has drawn up the probability distribution table in small samples, corresponding to these values \u200b\u200bof the trust coefficient t. and a different amount of small samples and, excerpt from it is given in Table. 5.8.

    Table 5.8.

    Fragment of the probability table of Student (probabilities are multiplied by 1000)

    Data Table. 5.8 indicate that with an unlimited increase in the size of the sample (I \u003d °°), the distribution of Student tends to the normal distribution law, and when I \u003d 20 is different from it.

    The Student distribution table is often given in a different form, more convenient for practical application (Table 5.9).

    Table 5.9.

    Some values \u200b\u200b(distributing Student

    The number of degrees of freedom

    for one-sided interval

    for a bilateral interval

    P \u003d. 0,99

    Consider how to use the distribution table. Each fixed value p Calculate the number of degrees of freedom k. where k \u003d p - 1. For each value of the degree of freedom, the limit value is indicated t P (T 095 or t 0. 99), which with this probability R It will not be exceeded by casual oscillations of the sample results. Based on magnitude t P. The boundaries of trusting are determined

    interval

    As a trust probability when bilateral verification, as a rule, use P \u003d. 0.95 or P \u003d. 0.99, which does not exclude the choice and other probability values. The probability value is selected based on specific requirements of tasks, to solve a small sample.

    The probability of the release of the values \u200b\u200bof the general average beyond the limit of the confidence interval is equal q, Where q. = 1 - r. This value is very little. Accordingly for the probabilities considered r It is 0.05 and 0.01.

    Small samples are widely distributed in technical sciences in biology, but to apply them in statistical studies is needed with great care, only with the appropriate theoretical and practical examination. It is possible to use a small sample if the distribution of the trait in the general population is normal or close to it, and the average value is calculated by selective data obtained as a result of independent observations. In addition, it should be borne in mind that the accuracy of the sample results of a small volume is lower than with a large sample.

    Share with friends or save for yourself:

    Loading...