Probabilistic statistical methods. Statistical Methods

Of particular interest is the quantitative assessment of entrepreneurial risk using the methods of mathematical statistics. The main tools of this assessment method are:

§ the probability of the appearance of a random variable,

§ mathematical expectation or average value of the investigated random variable,

Variance,

§ standard (root-mean-square) deviation,

§ the coefficient of variation ,

§ probability distribution of the investigated random variable.

To make a decision, you need to know the magnitude (degree) of risk, which is measured by two criteria:

1) average expected value (mathematical expectation),

2) fluctuations (variability) of the possible result.

Average expected value this is the weighted average of the random variable, which is associated with the uncertainty of the situation:

where is the value of the random variable.

Average expected value measures the result that we expect on average.

The average value is a generalized qualitative characteristic and does not allow making a decision in favor of any particular value of a random variable.

To make a decision, it is necessary to measure fluctuations in indicators, that is, to determine the measure of variability of a possible result.

The fluctuation in the possible outcome is the degree to which the expected value deviates from the mean.

For this, in practice, two closely related criteria are usually used: "variance" and "standard deviation".

Dispersion - the weighted average of the squares of the actual results from the average expected:

Standard deviation Is the square root of the variance. It is a dimensional quantity and is measured in the same units in which the investigated random variable is measured:

Dispersion and standard deviation are measures of absolute fluctuation. The coefficient of variation is usually used for analysis.

The coefficient of variation is the ratio of the standard deviation to the mean expected value, multiplied by 100%

or .

The coefficient of variation is not affected by the absolute values of the studied indicator.

Using the coefficient of variation, you can even compare the fluctuations of features expressed in different units of measurement. The coefficient of variation can vary from 0 to 100%. The larger the coefficient, the greater the fluctuation.

In economic statistics, the following estimate of different values of the coefficient of variation is established:

up to 10% - weak fluctuation, 10 - 25% - moderate, over 25% - high.

Accordingly, the higher the fluctuations, the greater the risk.

Example. The owner of a small store at the beginning of each day buys some perishable product for sale. A unit of this product costs UAH 200. Selling price - 300 UAH. for a unit. It is known from observations that the demand for this product throughout the day can be 4, 5, 6 or 7 units with the corresponding probabilities of 0.1; 0.3; 0.5; 0.1. If the product is not sold during the day, then at the end of the day it will always be bought at a price of UAH 150. for a unit. How many units of this product should the store owner buy at the start of the day?

Solution. Let's build a profit matrix for the store owner. Let's calculate the profit that the owner will receive if, for example, he buys 7 units of the product, and sells one unit during the day 6 and at the end of the day. Each unit of the product sold during the day gives a profit of UAH 100, and at the end of the day - losses of 200 - 150 = 50 UAH. Thus, the profit in this case will be:

Calculations are carried out similarly for other combinations of supply and demand.

The expected profit is calculated as the mathematical expectation of the possible values of the profit of each row of the constructed matrix, taking into account the corresponding probabilities. As you can see, among the expected profits, the largest is UAH 525. It corresponds to the purchase of the product in question in the amount of 6 units.

To substantiate the final recommendation on the purchase of the required number of units of the product, we calculate the variance, standard deviation and coefficient of variation for each possible combination of supply and demand of the product (each row of the profit matrix):


400	0,1	40	16000
400	0,3	120	48000
400	0,5	200	80000
400	0,1	40	16000
	1,0	400	160000


350	0,1	35	12250
500	0,3	150	75000
500	0,5	250	125000
500	0,1	50	25000
	1,0	485	2372500


300	0,1	30	9000
450	0,3	135	60750
600	0,5	300	180000
600	0,1	60	36000
	1,0	525	285750

As for the purchase by the store owner of 6 units of the product in comparison with 5 and 4 units, this is not obvious, since the risk when purchasing 6 units of a product (19.2%) is greater than when purchasing 5 units (9.3%) and even more so, than when purchasing 4 units (0%).

Thus, we have all the information about the expected profits and risks. And the store owner decides how many units of the product to buy each morning, taking into account his experience, risk appetite.

In our opinion, the store owner should be advised to buy 5 units of the product every morning and his average expected profit will be 485 UAH. and if we compare this with the purchase of 6 units of the product, in which the average expected profit is 525 UAH, which is 40 UAH. more, but the risk in this case will be 2.06 times greater.

How are probability theory and mathematical statistics used? These disciplines are the basis of probabilistic-statistical methods. decision making... To use their mathematical apparatus, you need problems decision making expressed in terms of probabilistic-statistical models. Application of a specific probabilistic-statistical method decision making consists of three stages:

transition from economic, managerial, technological reality to an abstract mathematical and statistical scheme, i.e. building a probabilistic model of a control system, technological process, decision-making procedures, in particular, based on the results of statistical control, etc .;
making calculations and obtaining conclusions by purely mathematical means within the framework of a probabilistic model;
interpretation of mathematical and statistical conclusions in relation to a real situation and making an appropriate decision (for example, on the conformity or non-conformity of product quality with established requirements, the need to adjust the technological process, etc.), in particular, conclusions (on the proportion of defective product units in a batch, on specific form of distribution laws monitored parameters technological process, etc.).

Mathematical statistics uses the concepts, methods and results of the theory of probability. Consider the main issues of building probabilistic models decision making in economic, managerial, technological and other situations. For active and correct use of normative-technical and instructional-methodological documents on probabilistic-statistical methods decision making requires prior knowledge. So, you need to know under what conditions a particular document should be applied, what initial information is necessary to have for its selection and application, what decisions should be made based on the results of data processing, etc.

Examples of the application of probability theory and mathematical statistics... Let's consider a few examples when probabilistic-statistical models are a good tool for solving managerial, production, economic, and national economic problems. So, for example, in the novel by A.N. Tolstoy's "Walking through the agony" (v. 1) says: "The workshop gives twenty-three percent of the marriage, and you stick to this figure," Strukov said to Ivan Ilyich. "

The question arises how to understand these words in the conversation of factory managers, since one unit of production cannot be 23% defective. It can be either good or defective. Probably, Strukov meant that a large batch contains about 23% of defective items. Then the question arises, what does "about" mean? Let 30 out of 100 tested units of production turn out to be defective, or out of 1000-300, or out of 100000-30000, etc., should Strukov be accused of lying?

Or another example. The coin to be used as a lot must be "symmetrical", i.e. when throwing it, on average, in half of the cases, the coat of arms should fall out, and in half of the cases - the lattice (tails, number). But what does "average" mean? If you carry out many series of 10 tosses in each series, then there will often be series in which the coin drops 4 times with the emblem. For a symmetrical coin, this will occur in 20.5% of the series. And if there are 40,000 coats of arms per 100,000 tosses, can the coin be considered symmetrical? Procedure decision making is built on the basis of probability theory and mathematical statistics.

The example in question may not seem serious enough. However, it is not. The drawing of lots is widely used in the organization of industrial technical and economic experiments, for example, when processing the results of measuring the quality indicator (friction moment) of bearings depending on various technological factors (the influence of a conservation environment, methods of preparing bearings before measurement, the effect of bearing load during measurement, etc.). NS.). Let's say it is necessary to compare the quality of bearings depending on the results of their storage in different conservation oils, i.e. in composition oils and. When planning such an experiment, the question arises, which bearings should be placed in the oil of the composition, and which ones in the oil of the composition, but in such a way as to avoid subjectivity and ensure the objectivity of the decision.

The answer to this question can be obtained by drawing lots. A similar example can be given with quality control of any product. To decide whether a controlled batch of products meets the established requirements or not, a sample is taken. Based on the results of sampling, a conclusion is made about the entire batch. In this case, it is very important to avoid subjectivity in the selection of the sample, i.e. it is necessary that each item in a controlled lot has the same likelihood of being sampled. In production conditions, the selection of units of production in the sample is usually carried out not by lot, but by special tables of random numbers or using computer random number sensors.

Similar problems of ensuring the objectivity of comparison arise when comparing different schemes. organization of production, remuneration, during tenders and competitions, selection of candidates for vacant positions, etc. Draws or similar procedures are needed everywhere. Let us explain by the example of identifying the strongest and second strongest teams when organizing a tournament according to the Olympic system (the loser is eliminated). Let the stronger team always win the weaker one. It is clear that the strongest team will definitely become the champion. The second strongest team will reach the final if and only if it has no games with the future champion before the final. If such a game is planned, then the second-strongest team will not make it to the final. Anyone planning a tournament can either “knock out” the second strongest team from the tournament ahead of schedule, bringing it together in the first meeting with the leader, or secure it the second place, ensuring meetings with weaker teams until the final. To avoid subjectivity, draw lots. For an 8-team tournament, the probability that the two strongest teams will meet in the final is 4/7. Accordingly, with a probability of 3/7, the second-strongest team will leave the tournament ahead of schedule.

Any measurement of product units (using a caliper, micrometer, ammeter, etc.) has errors. To find out if there are systematic errors, it is necessary to make multiple measurements of a unit of production, the characteristics of which are known (for example, a standard sample). It should be remembered that in addition to the systematic, there is also a random error.

Therefore, the question arises of how to find out from the measurement results whether there is a systematic error. If we only note whether the error obtained during the next measurement is positive or negative, then this problem can be reduced to the previous one. Indeed, let us compare the measurement with tossing a coin, the positive error - with the falling out of the coat of arms, negative - the grating (zero error with a sufficient number of scale divisions practically never occurs). Then checking the absence of a systematic error is equivalent to checking the symmetry of the coin.

The purpose of these considerations is to reduce the problem of checking the absence of a systematic error to the problem of checking the symmetry of a coin. The above reasoning leads to the so-called "sign criterion" in mathematical statistics.

With the statistical regulation of technological processes on the basis of the methods of mathematical statistics, rules and plans for statistical control of processes are developed, aimed at timely detection of disruptions in technological processes, taking measures to adjust them and preventing the release of products that do not meet the established requirements. These measures are aimed at reducing production costs and losses from the supply of substandard units. With statistical acceptance control, based on the methods of mathematical statistics, quality control plans are developed by analyzing samples from batches of products. The difficulty lies in being able to correctly build probabilistic and statistical models decision making, on the basis of which you can answer the above questions. In mathematical statistics, probabilistic models and methods for testing hypotheses have been developed for this, in particular, hypotheses that the proportion of defective units of production is equal to a certain number, for example (remember the words of Strukov from the novel by A.N. Tolstoy).

Assessment tasks... In a number of managerial, production, economic, and national economic situations, problems of a different type arise - the problem of assessing the characteristics and parameters of probability distributions.

Let's look at an example. Suppose that a batch of N light bulbs was received for inspection. A sample of n light bulbs was randomly selected from this batch. A number of natural questions arise. How, based on the results of testing the elements of the sample, to determine the average service life of electric lamps and with what accuracy can this characteristic be estimated? How does the accuracy change if you take a larger sample? At what number of hours can it be guaranteed that at least 90% of light bulbs will last more than one hour?

Suppose that when testing a sample with a volume of electric lamps, the electric lamps turned out to be defective. Then the following questions arise. What limits can be specified for the number of defective light bulbs in a batch, for the level of defectiveness, etc.?

Or, in a statistical analysis of the accuracy and stability of technological processes, such quality indicators as average monitored parameter and the degree of its spread in the process under consideration. According to the theory of probability, it is advisable to use its mathematical expectation as the mean value of a random variable, and variance, standard deviation, or the coefficient of variation... This raises the question: how to evaluate these statistical characteristics from sample data and with what accuracy can this be done? There are many similar examples. Here it was important to show how the theory of probability and mathematical statistics can be used in production management when making decisions in the field of statistical management of product quality.

What is "mathematical statistics"? Mathematical statistics is understood as "a section of mathematics devoted to mathematical methods for collecting, systematizing, processing and interpreting statistical data, as well as using them for scientific or practical conclusions. The rules and procedures of mathematical statistics are based on the theory of probability, which makes it possible to assess the accuracy and reliability of conclusions obtained in each problem based on the available statistical material "[[2.2], p. 326]. In this case, statistical data is called information about the number of objects in some more or less extensive set that have certain characteristics.

According to the type of problems being solved, mathematical statistics is usually divided into three sections: data description, estimation and hypothesis testing.

By the type of processed statistical data, mathematical statistics is divided into four areas:

one-dimensional statistics (statistics of random variables), in which the observation result is described by a real number;
multivariate statistical analysis, where the result of observation over an object is described by several numbers (vector);
statistics of random processes and time series, where the observation result is a function;
statistics of objects of a non-numerical nature, in which the observation result is of a non-numerical nature, for example, it is a set (geometric figure), an ordering, or is obtained as a result of measurement by a qualitative criterion.

Historically, some areas of statistics of objects of a non-numerical nature (in particular, problems of estimating the proportion of marriage and testing hypotheses about it) and one-dimensional statistics were the first to appear. The mathematical apparatus is simpler for them, therefore, by their example, the basic ideas of mathematical statistics are usually demonstrated.

Only those data processing methods, i.e. mathematical statistics are evidence based on probabilistic models of relevant real phenomena and processes. We are talking about models of consumer behavior, the occurrence of risks, the functioning of technological equipment, obtaining experimental results, the course of the disease, etc. A probabilistic model of a real phenomenon should be considered constructed if the quantities under consideration and the relationships between them are expressed in terms of probability theory. Compliance with the probabilistic model of reality, i.e. its adequacy is substantiated, in particular, with the help of statistical methods for testing hypotheses.

Improbable data processing methods are exploratory, they can only be used for preliminary data analysis, since they do not make it possible to assess the accuracy and reliability of conclusions obtained on the basis of limited statistical material.

Probabilistic and statistical methods are applicable wherever it is possible to construct and substantiate a probabilistic model of a phenomenon or process. Their use is mandatory when conclusions drawn from a sample of data are transferred to the entire population (for example, from a sample to an entire batch of products).

In specific areas of application, they are used as probabilistic statistical methods widespread use, and specific. For example, in the section of production management devoted to statistical methods of product quality management, applied mathematical statistics (including planning of experiments) are used. With the help of her methods, statistical analysis accuracy and stability of technological processes and statistical quality assessment. The specific methods include methods of statistical acceptance control of product quality, statistical regulation of technological processes, assessment and control of reliability, etc.

Applied probabilistic and statistical disciplines such as reliability theory and queuing theory are widely used. The content of the first of them is clear from the name, the second is studying systems such as a telephone exchange, which receives calls at random times - the requirements of subscribers dialing numbers on their telephones. The duration of servicing these claims, i.e. the duration of conversations is also modeled with random variables. A great contribution to the development of these disciplines was made by Corresponding Member of the USSR Academy of Sciences A.Ya. Khinchin (1894-1959), academician of the Academy of Sciences of the Ukrainian SSR B.V. Gnedenko (1912-1995) and other domestic scientists.

Briefly about the history of mathematical statistics... Mathematical statistics as a science begins with the works of the famous German mathematician Karl Friedrich Gauss (1777-1855), who, based on the theory of probability, investigated and substantiated least square method, created by him in 1795 and used for processing astronomical data (in order to clarify the orbit of the minor planet Ceres). His name is often called one of the most popular probability distributions - normal, and in the theory of random processes the main object of study is Gaussian processes.

At the end of the XIX century. - the beginning of the twentieth century. a major contribution to mathematical statistics was made by English researchers, primarily K. Pearson (1857-1936) and R.A. Fisher (1890-1962). In particular, Pearson developed the chi-square test for statistical hypotheses, and Fisher developed analysis of variance, experiment planning theory, maximum likelihood parameter estimation method.

In the 30s of the twentieth century. Pole Jerzy Neumann (1894-1977) and Englishman E. Pearson developed a general theory of testing statistical hypotheses, and Soviet mathematicians Academician A.N. Kolmogorov (1903-1987) and Corresponding Member of the USSR Academy of Sciences N.V. Smirnov (1900-1966) laid the foundations for nonparametric statistics. In the forties of the twentieth century. Romanian A. Wald (1902-1950) built a theory of sequential statistical analysis.

Mathematical statistics is developing rapidly at the present time. So, over the past 40 years, four fundamentally new areas of research can be distinguished [[2.16]]:

development and implementation of mathematical methods for planning experiments;
development of statistics of objects of non-numerical nature as an independent direction in applied mathematical statistics;
development of statistical methods that are stable in relation to small deviations from the used probabilistic model;
widespread development of work on the creation of computer software packages intended for statistical analysis of data.

Probabilistic-statistical methods and optimization... The idea of optimization permeates modern applied mathematical statistics and other statistical methods... Namely, methods of planning experiments, statistical acceptance control, statistical regulation of technological processes, etc. On the other hand, optimization statements in theory decision making, for example, the applied theory of optimization of product quality and the requirements of standards, provide for the widespread use of probabilistic and statistical methods, primarily applied mathematical statistics.

In production management, in particular when optimizing product quality and standard requirements, it is especially important to apply statistical methods at the initial stage of the product life cycle, i.e. at the stage of research preparation of experimental design developments (development of promising requirements for products, preliminary design, technical specifications for experimental design development). This is due to the limited information available at the initial stage of the product life cycle and the need to predict the technical capabilities and economic situation for the future. Statistical Methods should be used at all stages of solving the optimization problem - when scaling variables, developing mathematical models for the functioning of products and systems, conducting technical and economic experiments, etc.

All areas of statistics are used in optimization problems, including optimization of product quality and requirements of standards. Namely - statistics of random variables, multidimensional statistical analysis, statistics of random processes and time series, statistics of objects of non-numerical nature. The choice of a statistical method for the analysis of specific data is advisable to carry out according to the recommendations [

Part 1. Foundations of Applied Statistics

1.2.3. The essence of probabilistic and statistical decision-making methods

How are the approaches, ideas and results of probability theory and mathematical statistics used in decision making?

The base is a probabilistic model of a real phenomenon or process, i.e. a mathematical model in which objective relationships are expressed in terms of probability theory. Probabilities are used primarily to describe uncertainties that need to be considered when making decisions. This refers to both unwanted opportunities (risks) and attractive ones ("lucky chance"). Sometimes randomness is deliberately introduced into a situation, for example, by drawing lots, randomly selecting units to control, holding lotteries or consumer surveys.

Probability theory allows one to calculate other probabilities that are of interest to the researcher. For example, based on the probability of a coat of arms falling out, you can calculate the probability that with 10 coin tosses at least 3 coats of arms will fall out. Such a calculation is based on a probabilistic model, according to which coin tosses are described by a scheme of independent tests, in addition, the coat of arms and the lattice are equally possible, and therefore the probability of each of these events is ½. A more complex model is one in which, instead of tossing a coin, checking the quality of a unit of output is considered. The corresponding probabilistic model is based on the assumption that the quality control of various items of production is described by an independent test scheme. In contrast to the coin tossing model, a new parameter must be introduced - the probability R that the item is defective. The model will be fully described if it is assumed that all items have the same probability of being defective. If the latter assumption is incorrect, then the number of model parameters increases. For example, you can assume that each item has its own probability of being defective.

Let's discuss a quality control model with a common defectiveness probability for all product units R... In order to "reach the number" when analyzing the model, it is necessary to replace R for some specific meaning. To do this, it is necessary to go beyond the probabilistic model and turn to the data obtained during quality control. Mathematical statistics solves the inverse problem in relation to the theory of probability. Its purpose is to draw conclusions about the probabilities that underlie the probabilistic model based on the results of observations (measurements, analyzes, tests, experiments). For example, based on the frequency of occurrence of defective products during inspection, conclusions can be drawn about the probability of defectiveness (see Bernoulli's theorem above). Based on Chebyshev's inequality, conclusions were drawn about the correspondence of the frequency of occurrence of defective products to the hypothesis that the probability of defectiveness takes on a certain value.

Thus, the application of mathematical statistics is based on a probabilistic model of a phenomenon or process. Two parallel series of concepts are used - related to theory (probabilistic model) and related to practice (sample of observation results). For example, the theoretical probability corresponds to the frequency found from the sample. The mathematical expectation (theoretical series) corresponds to the sample arithmetic mean (practical series). Typically, sample characteristics are theoretical estimates. At the same time, the values related to the theoretical series “are in the heads of researchers”, refer to the world of ideas (according to the ancient Greek philosopher Plato), and are inaccessible for direct measurement. Researchers have only sample data, with the help of which they try to establish the properties of the theoretical probabilistic model that interest them.

Why is a probabilistic model needed? The fact is that only with its help it is possible to transfer the properties established from the results of the analysis of a particular sample to other samples, as well as to the entire so-called general population. The term “general population” is used when referring to a large but finite population of units under study. For example, about the aggregate of all residents of Russia or the aggregate of all consumers of instant coffee in Moscow. The goal of marketing or opinion polls is to transfer statements from a sample of hundreds or thousands of people to populations of several million people. In quality control, a batch of products acts as a general population.

To transfer conclusions from a sample to a larger population, one or another assumption is required about the relationship of the sample characteristics with the characteristics of this larger population. These assumptions are based on an appropriate probabilistic model.

Of course, it is possible to process sample data without using a particular probabilistic model. For example, you can calculate the sample arithmetic mean, calculate the frequency of the fulfillment of certain conditions, etc. However, the calculation results will relate only to a specific sample; the transfer of the conclusions obtained with their help to any other population is incorrect. This activity is sometimes referred to as “data mining”. Compared to probabilistic statistical methods, data analysis has limited cognitive value.

So, the use of probabilistic models based on evaluating and testing hypotheses using sample characteristics is the essence of probabilistic-statistical decision-making methods.

We emphasize that the logic of using sample characteristics for making decisions based on theoretical models involves the simultaneous use of two parallel series of concepts, one of which corresponds to probabilistic models, and the second to sample data. Unfortunately, in a number of literary sources, usually outdated or written in a recipe spirit, no distinction is made between selective and theoretical characteristics, which leads readers to bewilderment and errors in the practical use of statistical methods.

The phenomena of life, like all phenomena of the material world in general, have two inextricably linked sides: qualitative, perceived directly by the senses, and quantitative, expressed in numbers with the help of counting and measure.

In the study of various natural phenomena, both qualitative and quantitative indicators are used simultaneously. There is no doubt that only in the unity of the qualitative and quantitative aspects the essence of the studied phenomena is most fully revealed. However, in reality, you have to use either one or the other indicators.

There is no doubt that quantitative methods, as more objective and accurate, have an advantage over the qualitative characteristics of objects.

The measurement results themselves, although they have a certain value, are still insufficient to draw the necessary conclusions from them. Digital data collected during mass testing is just raw factual material that needs appropriate mathematical processing. Without processing - ordering and systematization of digital data, it is impossible to extract the information contained in them, to assess the reliability of individual summary indicators, to make sure that the differences observed between them are reliable. This work requires from specialists certain knowledge, the ability to correctly generalize and analyze the data collected in the experience. The system of this knowledge constitutes the content of statistics - a science that deals mainly with the analysis of research results in the theoretical and applied fields of science.

It should be borne in mind that mathematical statistics and probability theory are purely theoretical, abstract sciences; they study statistical aggregates without regard to the specifics of their constituent elements. The methods of mathematical statistics and the theory of probability underlying it are applicable to a wide variety of fields of knowledge, including the humanities.

The study of phenomena is carried out not on individual observations, which may turn out to be random, atypical, incompletely expressing the essence of a given phenomenon, but on a set of homogeneous observations, which gives more complete information about the object under study. A certain set of relatively homogeneous subjects, combined according to one or another criterion for joint study, is called statistical

aggregate. The set combines a number of homogeneous observations or registrations.

The elements that make up a collection are called its members, or options. ... Variants Are individual observations or numeric values of a characteristic. So, if we denote a feature by X (large), then its values or options will be denoted by x (small), i.e. x 1, x 2, etc.

The total number of options that make up a given population is called its volume and is denoted by the letter n (small).

When the entire set of homogeneous objects as a whole is examined, it is called a general, general, set.An example of this kind of continuous description of a set can be national censuses of the population, a general statistical registration of animals in the country. Of course, a complete survey of the general population provides the most complete information about its condition and properties. Therefore, it is natural for researchers to strive to bring together as many observations as possible.

However, in reality, it is rarely necessary to resort to surveying all members of the general population. Firstly, because this work requires a lot of time and labor, and secondly, it is not always feasible for a variety of reasons and various circumstances. So instead of a complete survey of the general population, some part of it, which is called the sample population, or sample, is usually subjected to study. It is the model by which the entire population as a whole is judged. For example, in order to find out the average growth of the conscript population of a certain region or district, it is not at all necessary to measure all the conscripts living in a given area, but it is enough to measure some part of them.

1. The sample should be completely representative, or typical, i.e. so that it includes mainly those options that most fully reflect the general population. Therefore, in order to start processing sample data, they are carefully reviewed and clearly atypical variants are removed. For example, when analyzing the cost of products manufactured by an enterprise, the cost in those periods when the enterprise was not fully provided with components or raw materials should be excluded.

2. The sample must be objective. When forming a sample, one should not act arbitrarily, include only those options that seem typical in its composition, and reject all the rest. A good-quality sample is made without preconceived opinions, by the method of drawing lots or by lottery, when none of the variants of the general population has any advantages over the others - to be included or not to be included in the sample. In other words, the sample should be randomly selected without affecting its composition.

3. The sample should be qualitatively uniform. It is impossible to include in the same sample data obtained under different conditions, for example, the cost of products obtained with a different number of employees.

6.2. Grouping observation results

Usually, the results of experiments and observations are entered in the form of numbers in registration cards or a journal, and sometimes just on sheets of paper - a statement or register is obtained. Such initial documents, as a rule, contain information not about one, but about several signs on which the observations were made. These documents serve as the main source of the formation of the sample. This is usually done like this: on a separate sheet of paper from the primary document, i.e. card index, journal or statement, the numerical values of the attribute by which the aggregate is formed are written out. The options in such a combination are usually presented in the form of a disorderly mass of numbers. Therefore, the first step towards processing such material is ordering, systematizing it - grouping the option into statistical tables or rows.

Statistical tables are one of the most common forms of grouping sample data. They are illustrative, showing some general results, the position of individual elements in the general series of observations.

Another form of primary grouping of sample data is the ranking method, i.e. the location of the variant in a certain order - according to the increasing or decreasing values of the attribute. As a result, a so-called ranked series is obtained, which shows in what limits and how this feature varies. For example, there is a sample of the following composition:

5,2,1,5,7,9,3,5,4,10,4,5,7,3,5, 9,4,12,7,7

It can be seen that the feature varies from 1 to 12 of some units. We arrange the options in ascending order:

1,2,3,3,4,4,4,5,5,5,5,7,7,7,7,9,9,10,12.,

As a result, a ranked series of values of the varying attribute was obtained.

Obviously, the ranking method as shown here is applicable only to small samples. With a large number of observations, the ranking becomes difficult, because the row is so long that it loses its meaning.

With a large number of observations, it is customary to rank the sample in the form of a double series, i.e. indicating the frequency or frequency of individual variants of the ranked series. Such a double series of ranked values of a feature is called a variation series or distribution series. The simplest example of a variation series can be the data ranked above, if they are arranged as follows:

Characteristic values

(options) 1 2 3 4 5 7 9 10 12

repeatability

(option) frequencies 1 1 2 3 5 4 2 1 1

The variation series shows the frequency with which individual variants are found in a given population, how they are distributed, which is of great importance, allowing us to judge the patterns of variation and the range of variation of quantitative characteristics. The construction of variation series facilitates the calculation of total indicators - the arithmetic mean and variance or dispersion of the variant about their mean value - indicators that characterize any statistical population.

Variational series are of two types: discontinuous and continuous. A discontinuous variation series is obtained from the distribution of discrete quantities, which include counting features. If the feature varies continuously, i.e. can take any values in the range from the minimum to the maximum variant of the population, then the latter is distributed in a continuous variation series.

To construct a variational series of a discretely varying feature, it is sufficient to arrange the entire set of observations in the form of a ranked series, indicating the frequencies of individual variants. As an example, we give data showing the size distribution of 267 parts (table 5.4)

Table 6.1. Distribution of parts by size.

To build a variational series of continuously varying features, you need to divide the entire variation from the minimum to the maximum variant into separate groups or intervals (from-to), called classes, and then distribute all the variants of the population among these classes. The result will be a double variation series, in which the frequencies no longer refer to individual specific variants, but to the entire interval, i.e. turns out to be frequencies not of an option, but of classes.

The division of the total variation into classes is carried out on the scale of the class interval, which should be the same for all classes of the variation series. The size of the class interval is denoted by i (from the word intervalum - interval, distance); it is determined by the following formula

, (6.1)

where: i - class interval, which is taken as an integer;

- maximum and minimum sample options;

lg.n is the logarithm of the number of classes into which the sample is divided.

The number of classes is set arbitrarily, but taking into account the fact that the number of classes is in some dependence on the sample size: the larger the sample size, the more classes should be, and vice versa - with smaller sample sizes, a smaller number of classes should be taken. Experience has shown that even on small samples, when it is necessary to group variants in the form of a variation series, one should not set less than 5-6 classes. If there is a 100-150 option, the number of classes can be increased to 12-15. If the totality consists of 200-300 variants, then it is divided into 15-18 classes, etc. Of course, these recommendations are very conditional and cannot be taken as an established rule.

When breaking down into classes, in each specific case, you have to reckon with a number of different circumstances, ensuring that the processing of statistical material gives the most accurate results.

After the class interval is established and the sample is divided into classes, the variant is posted by class and the number of variations (frequencies) of each class is determined. The result is a variation series in which the frequencies do not belong to individual variants, but to certain classes. The sum of all frequencies of the variation series should be equal to the sample size, that is

(6.2)

where:
-summation sign;

p is the frequency.

n is the sample size.

If there is no such equality, then an error was made when posting the variant by class, which must be eliminated.

Usually, for posting an option by class, an auxiliary table is drawn up, in which there are four columns: 1) classes for this attribute (from - to); 2) - average value of classes, 3) posting option by class, 4) frequency of classes (see table 6.2.)

Posting an option by class requires a lot of attention. It should not be allowed that the same variant was marked twice or that the same variants fall into different classes. To avoid errors in the distribution of a variant by classes, it is recommended not to look for the same variants in the aggregate, but to distribute them by classes, which is not the same thing. Ignoring this rule, which happens in the work of inexperienced researchers, takes a lot of time when posting an option, and most importantly, leads to errors.

Table 6.2. Post option by class

Class boundaries	Average values of classes (x)	Class frequencies (p),%
Class boundaries	Average values of classes (x)	absolute	relative

Having finished posting the variation and counting their number for each class, we get a continuous variation series. It must be turned into a discontinuous variation series. For this, as already noted, we take the half-sums of the extreme values of the classes. So, for example, a first class median of 8.8 is obtained as follows:

(8,6+9,0):2=8,8.

The second value (9.3) of this graph is calculated in a similar way:

(9.01 + 9.59): 2 = 9.3, etc.

As a result, a discontinuous variation series is obtained, showing the distribution according to the studied trait (Table 6.3.)

Table 6.3. Variational series

The grouping of sample data in the form of a variation series has a twofold purpose: firstly, as an auxiliary operation, it is necessary when calculating total indicators, and secondly, the distribution series show the regularity of the variation of features, which is very important. To express this pattern more clearly, it is customary to depict the variation series graphically in the form of a histogram (Figure 6.1.)

Figure 6.1 Distribution of enterprises by number of employees

bar graph depicts the distribution of the variant with continuous variation of the characteristic. The rectangles correspond to the classes, and their heights correspond to the number of options enclosed in each class. If from the midpoints of the vertices of the rectangles of the histogram we lower the perpendiculars to the abscissa axis, and then connect these points to each other, we get a graph of continuous variation, called a polygon or distribution density.

When conducting psychological and pedagogical research, an important role is given to mathematical methods for modeling processes and processing experimental data. These methods should include, first of all, the so-called probabilistic-statistical research methods. This is due to the fact that the behavior of both an individual person in the course of his activity and a person in a team is significantly influenced by many random factors. Randomness does not allow describing phenomena within the framework of deterministic models, since it manifests itself as insufficient regularity in mass phenomena and, therefore, does not make it possible to reliably predict the occurrence of certain events. However, when studying such phenomena, certain patterns are revealed. Irregularity inherent in random events, with a large number of tests, is usually compensated by the appearance of a statistical regularity, stabilization of the frequencies of occurrence of random events. Consequently, these random events have a certain probability. There are two fundamentally different probabilistic and statistical methods of psychological and pedagogical research: classical and non-classical. Let's carry out a comparative analysis of these methods.

The classical probabilistic-statistical method. The classical probabilistic-statistical method of research is based on probability theory and mathematical statistics. This method is used in the study of mass phenomena of a random nature; it includes several stages, the main of which are as follows.

1. Construction of a probabilistic model of reality based on the analysis of statistical data (determination of the distribution law of a random variable). Naturally, the larger the volume of statistical material, the more clearly the regularities of mass random phenomena are expressed. The sample data obtained during the experiment are always limited and, strictly speaking, are random in nature. In this regard, an important role is assigned to the generalization of the patterns obtained in the sample, and their extension to the entire general population of objects. In order to solve this problem, a certain hypothesis is accepted about the nature of the statistical regularity, which is manifested in the phenomenon under study, for example, the hypothesis that the phenomenon under study obeys the law of normal distribution. Such a hypothesis is called the null hypothesis, which may turn out to be erroneous, therefore, along with the null hypothesis, an alternative or competing hypothesis is also put forward. The verification of how the obtained experimental data correspond to a particular statistical hypothesis is carried out using the so-called nonparametric statistical tests or goodness-of-fit tests. At present, the Kolmogorov, Smirnov, omega-square, and other goodness-of-fit criteria are widely used. The main idea behind these criteria is to measure the distance between the empirical distribution function and the fully known theoretical distribution function. The methodology for testing a statistical hypothesis is rigorously developed and outlined in a large number of works on mathematical statistics.

2. Carrying out the necessary calculations by mathematical means within the framework of a probabilistic model. In accordance with the established probabilistic model of the phenomenon, characteristic parameters are calculated, for example, such as mathematical expectation or mean value, variance, standard deviation, mode, median, asymmetry index, etc.

3. Interpretation of probabilistic and statistical conclusions in relation to the real situation.

Currently, the classical probabilistic-statistical method is well developed and widely used in research in various fields of natural, technical and social sciences. A detailed description of the essence of this method and its application to solving specific problems can be found in a large number of literary sources, for example, in.

Non-classical probabilistic-statistical method. The non-classical probable-statistical method of research differs from the classical one in that it is applied not only to mass events, but also to individual events that are fundamentally random in nature. This method can be effectively used to analyze the behavior of an individual in the process of performing a particular activity, for example, in the process of assimilating knowledge by students. We will consider the features of the non-classical probabilistic-statistical method of psychological and pedagogical research using the example of students' behavior in the process of assimilating knowledge.

For the first time, a probabilistic-statistical model of student behavior in the process of assimilating knowledge was proposed in the work. Further development of this model has been done in the work. Learning as a type of activity, the purpose of which is the acquisition of knowledge, abilities and skills by a person depends on the level of development of the student's consciousness. The structure of consciousness includes such cognitive processes as sensation, perception, memory, thinking, imagination. The analysis of these processes shows that they are characterized by elements of randomness, due to the random nature of the mental and somatic states of the individual, as well as physiological, psychological and informational noises during the work of the brain. The latter led, when describing the processes of thinking, to abandon the use of the model of a deterministic dynamic system in favor of a model of a random dynamic system. This means that the determinism of consciousness is realized through chance. Hence, we can conclude that human knowledge, which is actually a product of consciousness, also has a random character, and, therefore, to describe the behavior of each individual student in the process of assimilating knowledge, a probabilistic-statistical method can be used.

In accordance with this method, the student is identified by a distribution function (probability density), which determines the probability of finding him in a unit area of the information space. In the learning process, the distribution function with which the student is identified, while evolving, moves in the information space. Each student has individual properties and independent localization (spatial and kinematic) of individuals relative to each other is allowed.

Based on the law of conservation of probability, a system of differential equations is written, which are equations of continuity, which relate the change in the probability density per unit time in the phase space (the space of coordinates, velocities and accelerations of various orders) with the divergence of the probability density flux in the phase space under consideration. In the analysis of analytical solutions of a number of continuity equations (distribution functions), characterizing the behavior of individual students in the learning process.

When conducting experimental studies of student behavior in the process of assimilating knowledge, probabilistic-statistical scaling is used, in accordance with which the measurement scale is an ordered system , where A is some well-ordered set of objects (individuals) with the features of interest to us (empirical system with relations); Ly - functional space (space of distribution functions) with relations; F is the operation of a homomorphic mapping of A to the subsystem Ly; G - group of admissible transformations; f is the operation of mapping distribution functions from subsystem Ly to numerical systems with ratios of n-dimensional space M. Probabilistic-statistical scaling is used to find and process experimental distribution functions and includes three stages.

1. Finding experimental distribution functions based on the results of a control event, for example, an exam. A typical view of the individual distribution functions found using a twenty-point scale is shown in Fig. 1. The method for finding such functions is described in.

2. Mapping distribution functions to a number space. For this purpose, the moments of individual distribution functions are calculated. In practice, as a rule, it is sufficient to restrict ourselves to determining the moments of the first order (mathematical expectation), second order (variance), and third order, which characterizes the asymmetry of the distribution function.

3. Ranking students according to the level of knowledge based on a comparison of the moments of different orders of their individual distribution functions.

Rice. 1. Typical form of individual distribution functions of students who received different grades on the exam in general physics: 1 - the traditional grade “2”; 2 - traditional rating "3"; 3 - traditional rating "4"; 4 - traditional rating "5"

Experimental distribution functions for the flow of students were found on the basis of the additivity of individual distribution functions in (Fig. 2).

Rice. 2. Evolution of the complete distribution function of the flow of students, approximated by smooth lines: 1 - after the first year; 2 - after the second course; 3 - after the third course; 4 - after the fourth course; 5 - after the fifth course

Analysis of the data presented in Fig. 2 shows that as one moves in the information space, the distribution functions spread out. This is due to the fact that the mathematical expectations of the distribution functions of individuals move at different speeds, and the functions themselves spread out due to dispersion. Further analysis of these distribution functions can be carried out within the framework of the classical probabilistic-statistical method.

The discussion of the results. Analysis of the classical and non-classical probabilistic and statistical methods of psychological and pedagogical research showed that there is a significant difference between them. It, as can be understood from the above, is that the classical method is applicable only to the analysis of mass events, and the non-classical method is applicable both to the analysis of mass and single events. In this regard, the classical method can be conventionally called the mass probabilistic-statistical method (MVSM), and the non-classical method - the individual probabilistic-statistical method (ISM). In 4] it is shown that none of the classical methods of assessing students' knowledge within the framework of the probabilistic-statistical model of the individual can be applied for these purposes.

Let us consider the distinctive features of the MVSM and IVSM methods using the example of measuring the completeness of students' knowledge. For this purpose, we will conduct a thought experiment. Suppose that there is a large number of students absolutely identical in mental and physical characteristics, having the same prehistory, and let them, without interacting with each other, simultaneously participate in the same cognitive process, experiencing absolutely the same strictly deterministic influence. Then, in accordance with the classical ideas about the objects of measurement, all students should have received the same estimates of the completeness of knowledge with any given measurement accuracy. However, in reality, given a sufficiently high measurement accuracy, the assessments of the completeness of students' knowledge will differ. It is not possible to explain such a measurement result within the framework of the MVSM, since it is initially assumed that the impact on absolutely identical non-interacting students has a strictly deterministic nature. The classical probabilistic-statistical method does not take into account the fact that the determinism of the cognitive process is realized through randomness, inherent in each individual who cognizes the world around him.

The random nature of the student's behavior in the process of assimilating knowledge is taken into account by the IVSM. The use of an individual probabilistic-statistical method to analyze the behavior of the idealized group of students under consideration would show that it is impossible to indicate exactly the position of each student in the information space, we can only say the probability of finding him in one or another area of the information space. In fact, each student is identified by an individual distribution function, and its parameters, such as mathematical expectation, variance, etc., are individual for each student. This means that individual distribution functions will be located in different areas of the information space. The reason for this behavior of students lies in the random nature of the learning process.

However, in a number of cases, the research results obtained within the framework of the MVSM can be interpreted within the framework of the IVSM. Suppose that the teacher uses a five-point measurement scale to assess the student's knowledge. In this case, the error in assessing knowledge is ± 0.5 points. Therefore, when a student is given a grade, for example, 4 points, this means that his knowledge is in the range from 3.5 points to 4.5 points. In fact, the position of an individual in the information space in this case is determined by a rectangular distribution function, the width of which is equal to the measurement error ± 0.5 points, and the estimate is the mathematical expectation. This error is so large that it does not allow observing the true form of the distribution function. However, despite such a rough approximation of the distribution function, the study of its evolution allows one to obtain important information, both about the behavior of an individual individual and the student body as a whole.

The result of measuring the completeness of a student's knowledge is directly or indirectly influenced by the consciousness of the teacher (meter), which is also characterized by randomness. In the process of pedagogical measurements, in fact, there is an interaction of two random dynamic systems that identify the behavior of a student and a teacher in this process. In the article, the interaction of the student subsystem with the faculty subsystem is considered and it is shown that the speed of movement of the mathematical expectation of individual distribution functions of students in the information space is proportional to the influence function of the teaching staff and is inversely proportional to the function of inertia, which characterizes the intractability to change the position of the mathematical expectation in space (analogue of Aristotle's law in mechanics).

At present, despite significant advances in the development of theoretical and practical foundations of measurements during psychological and pedagogical research, the problem of measurements as a whole is still far from being solved. This is primarily due to the fact that there is still not enough information about the influence of consciousness on the measurement process. A similar situation has arisen when solving the problem of measurements in quantum mechanics. Thus, when considering the conceptual problems of the quantum theory of measurements, it is said that it is impossible to resolve some paradoxes of measurements in quantum mechanics "... it is hardly possible without the direct inclusion of the observer's consciousness in the theoretical description of the quantum measurement." It goes on to say that “… the assumption that consciousness can make a certain event probable is consistent, even if, according to the laws of physics (quantum mechanics), the probability of this event is small. Let's make an important clarification of the wording: the consciousness of a given observer can make it probable that he will see this event. "