How to find fashion if numbers are not repeated. Statistics

Fashion and median- a special kind of averages that are used to study the structure of the variation series. They are sometimes called structural averages, in contrast to the power-law averages discussed earlier.

Fashion- this is the value of a feature (option), which is most often found in a given population, i.e. has the highest frequency.

Fashion has a great practical application and in some cases only fashion can characterize social phenomena.

Median Is a variation that is in the middle of an ordered variation series.

The median shows the quantitative boundary of the value of a variable characteristic, which has reached half of the units of the population. The use of the median along with the average or instead of it is advisable if there are open intervals in the variation series, because to calculate the median, no conditional setting of the boundaries of the open intervals is required, and therefore the lack of information about them does not affect the accuracy of the median calculation.

The median is also used when the metrics to be used as weights are not known. The median is used instead of the arithmetic mean in statistical methods of product quality control. The sum of the absolute deviations of the options from the median is less than from any other number.

Consider the calculation of the mode and median in a discrete variation series :

Determine fashion and median.

Fashion Moe = 4 years, since this value corresponds to the highest frequency f = 5.

Those. greatest number workers have 4 years of experience.

In order to calculate the median, let us first find half of the sum of frequencies. If the sum of frequencies is an odd number, then we first add one to this sum, and then divide in half:

The median will be the eighth option.

In order to find which option will be the eighth in number, we will accumulate frequencies until we get the sum of frequencies equal to or greater than half of the sum of all frequencies. The corresponding variant will be the median.

Me = 4 years.

Those. half of the workers have less than four years of experience, half more.

If the sum of the accumulated frequencies against one variant is equal to half the sum of the frequencies, then the median is determined as the arithmetic mean of this variant and the following.

Calculation of the mode and median in the interval variation series

The mode in the interval variation series is calculated by the formula

where NS M0- the initial boundary of the modal interval,

hm 0 - the value of the modal interval,

fm 0 , fm 0-1 , fm 0+1 - frequency, respectively, of the modal interval preceding the modal and subsequent.

Modal is called the interval to which the highest frequency corresponds.

Example 1

Seniority groups

Number of workers, people

Accumulated frequencies

Determine fashion and median.

Modal spacing, because it corresponds to the highest frequency f = 35. Then:

HM 0 =6, fm 0 =35

Median is a feature value that divides the ranked distribution series into two equal parts - with feature values ​​less than the median and with feature values ​​greater than the median. To find the median, you need to find the value of the feature, which is in the middle of the ordered row.

View the solution to the problem of finding the mode and median You can

In ranked ranks, ungrouped data for finding the median are reduced to finding the ordinal number of the median. The median can be calculated using the following formula:

where Xm is the lower border of the median interval;
im - median interval;
Sme is the sum of observations that was accumulated before the beginning of the median interval;
fme is the number of observations in the median interval.

Median properties

  1. The median does not depend on those values ​​of the characteristic that are located on either side of it.
  2. Analytical operations with the median are very limited, therefore, when combining two distributions with known medians, it is impossible to predict in advance the value of the median of the new distribution.
  3. The median possesses minimality property. Its essence lies in the fact that the sum of the absolute deviations of the values ​​of x from the median is the minimum value in comparison with the deviation of X from any other value

Graphical definition of the median

For determining graphical medians use the accumulated frequencies over which the cumulative curve is constructed. The vertices of the ordinates corresponding to the accumulated frequencies are connected by straight line segments. Dividing the pop olam the last ordinate, which corresponds to the total sum of frequencies and drawing the perpendicular to it of the intersection with the cumulative curve, find the ordinate of the desired median value.

Definition of fashion in statistics

Fashion - the meaning of a feature, which has the highest frequency in the statistical distribution series.

Defining fashion produced in different ways, and it depends on whether the variable feature is presented in the form of a discrete or interval series.

Finding fashion and the median occurs by simply scanning the frequency column. In this column, find the largest number that characterizes the highest frequency. It corresponds to a certain value of the attribute, which is fashion. In the interval variation series, the mode is approximately considered the central variant of the interval with the highest frequency. In such a series of distribution fashion is calculated by the formula:

where ХМо is the lower limit of the modal interval;
imo - modal interval;
fm0, fm0-1, fm0 + 1 - frequencies in the modal, previous and following modal intervals.

The modal spacing is determined by the highest frequency.

Fashion is widely used in statistical practice in the analysis of purchase demand, price registration, etc.

Relationship between arithmetic mean, median and fashion

For a unimodal symmetric distribution series, the median and mode coincide. They are not the same for skewed distributions.

K. Pearson, on the basis of equalizing various types of curves, determined that for moderately asymmetric distributions such approximate relations between the arithmetic mean, median and mode are valid:

When studying the workload of students, a group of 12 seventh-graders was identified. They were asked to mark on a specific day the time (in minutes) spent on their algebra homework. Received the following data: 23, 18, 25, 20, 25, 25, 32, 37, 34, 26, 34, 25. When studying the study load of students, a group of 12 seventh graders was identified. They were asked to mark on a specific day the time (in minutes) spent on their algebra homework. We got the following data: 23, 18, 25, 20, 25, 25, 32, 37, 34, 26, 34, 25.


The arithmetic mean of the series. The arithmetic mean of a series of numbers is the quotient of dividing the sum of these numbers by the number of terms. The arithmetic mean of a series of numbers is the quotient of dividing the sum of these numbers by the number of terms. (): 12 = 27


The range of the series. The span of a series is the difference between the largest and smallest of these numbers. The range is the difference between the largest and the smallest of these numbers. The highest time consumption is 37 minutes, and the lowest is 18 minutes. Find the range of the series: 37 - 18 = 19 (min)


Row fashion. The mode of a series of numbers is the number that occurs in this series more often than others. The mode of a series of numbers is the number that occurs in this series more often than others. The mode of our series is the number - 25. The mode of our series is the number - 25. A series of numbers may or may not have more than one mode. 1) 47,46,50,47,52,49,45,43,53,53,47,52 - two modes 47 and 52.2) 69,68,66,70,67,71,74,63, 73.72 - there is no fashion.


The arithmetic mean, range and fashion, find application in statistics - a science that deals with obtaining, processing and analyzing quantitative data on various mass phenomena occurring in nature and society. The arithmetic mean, range and fashion, find application in statistics - a science that deals with obtaining, processing and analyzing quantitative data on various mass phenomena occurring in nature and society. Statistics studies the number of individual groups of the population of the country and its regions, the production and consumption of various types of products, the transportation of goods and passengers different kinds transport, Natural resources etc. Statistics study the number of individual groups of the population of a country and its regions, production and consumption of various types of products, transportation of goods and passengers by various modes of transport, natural resources, etc.


1. Find the arithmetic mean and the range of a series of numbers: a) 24,22,27,20,16,37; b) 30,5,23,5,28, Find the arithmetic mean, range and mode of a number of numbers: a) 32,26,18,26,15,21,26; b) -21, -33, -35, -19, -20, -22; b) -21, -33, -35, -19, -20, -22; c) 61.64.64.83.61.71.70; c) 61.64.64.83.61.71.70; d) -4, -6, 0, 4, 0, 6, 8, -12. d) -4, -6, 0, 4, 0, 6, 8, In the series of numbers 3, 8, 15, 30, __, 24, one number is missing, Find it if: a) the arithmetic mean of the series is 18; a) the arithmetic mean of the series is 18; b) the range of the series is 40; b) the range of the series is 40; c) the mode of the series is 24. c) the mode of the series is 24.


4. In the secondary education certificate of four friends - graduates of the school - the following marks were found: Ilyin: 4,4,5,5,4,4,4,5,5,5,4,4,5,4,4; Ilyin: 4,4,5,5,4,4,4,5,5,5,4,4,5,4,4; Semyonov: 3,4,3,3,3,3,4,3,3,3,3,4,4,5,4; Semyonov: 3,4,3,3,3,3,4,3,3,3,3,4,4,5,4; Popov: 5,5,5,5,5,4,4,5,5,5,5,5,4,4,4; Popov: 5,5,5,5,5,4,4,5,5,5,5,5,4,4,4; Romanov: 3,3,4,4,4,4,4,3,4,4,4,5,3,4,4. Romanov: 3,3,4,4,4,4,4,3,4,4,4,5,3,4,4. What grade point did each of these graduates graduate from? Indicate the most typical grade for each of them in the certificate. What statistical characteristics did you use when answering? What grade point did each of these graduates graduate from? Indicate the most typical grade for each of them in the certificate. What statistical characteristics did you use when answering?


Independent work Option 1. Option Given a series of numbers: 35, 44, 37, 31, 41, 40, 31, 29. Find the arithmetic mean, range and mode of glad. 2. In the row of numbers 4, 9, 16, 31, _, 25 4, 9, 16, 31, _, 25 one number is missing. one number is missing. Find it if: Find it if: a) the arithmetic mean a) the arithmetic mean is 19; something is equal to 19; b) the range of the series - 41. b) the range of the series - 41. Variant A number of numbers are given: 38, 42, 36, 45, 48, 45.45, 42. Find the arithmetic mean, range and fashion is glad. 2. In the row of numbers 5, 10, 17, 32, _, 26, one number is missing. Find it if: a) the arithmetic mean is 19; b) the range of the series is 41.


The median of an ordered series of numbers with an odd number of numbers is the number written in the middle, and the median of an ordered series of numbers with an even number of numbers is the arithmetic mean of two numbers written in the middle. The median of an ordered series of numbers with an odd number of numbers is the number written in the middle, and the median of an ordered series of numbers with an even number of numbers is the arithmetic mean of two numbers written in the middle. The table shows the electricity consumption in January by the tenants of nine apartments: The table shows the electricity consumption in January by the tenants of nine apartments:


Let's make an ordered row: 64, 72, 72, 75, 78, 82, 85, 91.93. 64, 72, 72, 75, 78, 82, 85, 91, is the median of this series. 78 is the median of this series. Given an ordered row: Given an ordered row: 64, 72, 72, 75, 78, 82, 85, 88, 91, 93. (): 2 = 80 is the median. (): 2 = 80 - median.


1. Find the median of a series of numbers: a) 30, 32, 37, 40, 41, 42, 45, 49, 52; a) 30, 32, 37, 40, 41, 42, 45, 49, 52; b) 102, 104, 205, 207, 327, 408, 417; b) 102, 104, 205, 207, 327, 408, 417; c) 16, 18, 20, 22, 24, 26; c) 16, 18, 20, 22, 24, 26; d) 1.2, 1.4, 2.2, 2.6, 3.2, 3.8, 4.4, 5.6. d) 1.2, 1.4, 2.2, 2.6, 3.2, 3.8, 4.4, 5.6. 2. Find the arithmetic mean and median of a number of numbers: a) 27, 29, 23, 31,21,34; a) 27, 29, 23, 31, 21, 34; b) 56, 58, 64, 66, 62, 74; b) 56, 58, 64, 66, 62, 74; c) 3.8, 7.2, 6.4, 6.8, 7.2; c) 3.8, 7.2, 6.4, 6.8, 7.2; d) 21.6, 37.3, 16.4, 12, 6.d) 21.6, 37.3, 16.4, 12, 6.


3. The table shows the number of visitors to the exhibition on different days of the week: Find the median of the specified data series. On what days of the week was the number of visitors to the exhibition higher than the median? Days of the week Mon Mon Tue Wed Wed Thu Thu Fri Fri Sat Sat Sun Sun Number of visitors


4. Below is the average daily sugar processing (in thousand centners) by sugar factories of a certain region: (in thousand centners) by sugar factories of a certain region: 12.2, 13.2, 13.7, 18.0, 18.6 , 12.2, 18.5, 12.4, 12.2, 13.2, 13.7, 18.0, 18.6, 12.2, 18.5, 12.4, 14, 2, 17 ,eight. 14, 2, 17.8. For the presented series, find the arithmetic mean, mode, range, and median. For the presented series, find the arithmetic mean, mode, range, and median. 5. The organization kept a daily record of letters received during the month. As a result, we got the following data series: 39, 43, 40, 0, 56, 38, 24, 21, 35, 38, 0, 58, 31, 49, 38, 25, 34, 0, 52, 40, 42, 40 , 39, 54, 0, 64, 44, 50, 38, 37, 43, 40, 0, 56, 38, 24, 21, 35, 38, 0, 58, 31, 49, 38, 25, 34, 0 , 52, 40, 42, 40, 39, 54, 0, 64, 44, 50, 38, 37, 32. For the presented series, find the arithmetic mean, mode, range and median. For the presented series, find the arithmetic mean, mode, range, and median.


Homework... At the figure skating competitions, the performance of the athlete was assessed with the following points: At the figure skating competitions, the performance of the athlete was assessed with the following points: 5.2; 5.4; 5.5; 5.4; 5.1; 5.1; 5.4; 5.5; 5.3. 5.2; 5.4; 5.5; 5.4; 5.1; 5.1; 5.4; 5.5; 5.3. For the resulting series of numbers, find the arithmetic mean, range and mode. For the resulting series of numbers, find the arithmetic mean, range and mode.



TEST

On the topic: "Fashion. Median. Methods of their calculation"


Introduction

Average values ​​and associated indicators of variation play a very important role in statistics, which is due to the subject of its study. Therefore, this topic is one of the central topics in the course.

Average is a very common summary indicator in statistics. This is due to the fact that only with the help of the average it is possible to characterize the population in terms of quantitatively varying characteristics. The average value in statistics is a generalizing characteristic of a set of phenomena of the same type for some quantitatively varying attribute. The average shows the level of this trait, referred to the unit of the population.

Studying social phenomena and seeking to identify their characteristic, typical features in specific conditions of place and time, statisticians widely use averages. With the help of averages, different populations can be compared with each other according to varying characteristics.

Averages that are used in statistics belong to the class of power averages. Of the power averages, the arithmetic mean is most often used, less often the harmonic mean; the harmonic mean is used only when calculating the average rates of dynamics, and the mean square - only when calculating the indicators of variation.

The arithmetic mean is the quotient of dividing the sum of the variant by their number. It is used in cases where the volume of a varying feature for the entire population is formed as the sum of the values ​​of the feature for its individual units. The arithmetic mean is the most common type of averages, since it corresponds to the nature of social phenomena, where the volume of varying attributes in the aggregate is most often formed precisely as the sum of the values ​​of the attribute in individual units of the aggregate.

According to its defining property, the harmonic average should be used when the total volume of the feature is formed as the sum of the reciprocal values ​​of the variant. It is used when, depending on the weight of the material, it is necessary not to multiply, but to divide into options or, which is the same, to multiply by their inverse value. The harmonic mean in these cases is the reciprocal of the arithmetic mean of the reciprocal values ​​of the attribute.

The harmonic mean should be used in cases where the weights are not the aggregate units - the carriers of the attribute, but the product of these units by the attribute value.


1. Determination of fashion and median in statistics

The arithmetic and harmonic means are generalizing characteristics of the population for one or another varying attribute. Mode and median are auxiliary descriptive characteristics of the distribution of a variable feature.

Fashion in statistics is the value of a feature (option), which is most often found in a given population. In the variation series, this will be the variant with the highest frequency.

The median in statistics is the variant that is in the middle of the variation series. The median divides the row in half, on either side of it (up and down) there are the same number of population units.

The mode and median, in contrast to the power-law means, are specific characteristics, their value has any particular variant in the variation series.

The fashion is used in those cases when it is necessary to characterize the most frequent value of a feature. If you need, for example, to find out the most common wage at the enterprise, the price on the market at which it was sold the largest number goods, the size of the shoes that are in the greatest demand among consumers, etc., in these cases, fashion is resorted to.

The median is interesting in that it shows the quantitative boundary of the value of the varying attribute, which has been reached by half of the members of the population. Let the average salary of bank employees be 650,000 rubles. per month. This characteristic can be supplemented if we say that half of the workers received a salary of 700,000 rubles. and higher, i.e. we give the median. Mode and median are typical characteristics when populations of homogeneous and large numbers are taken.


2. Finding the Mode and Median in a Discrete Variational Series

It is not difficult to find the mode and the median in the variation series, where the values ​​of the feature are given by certain numbers. Consider table 1. with the distribution of families by the number of children.

Table 1. Distribution of families by number of children

Obviously, in this example, the fashion will be a family with two children, since the largest number of families corresponds to this value of options. There may be distributions where all variants occur equally often, in this case there is no fashion, or, otherwise, we can say that all variants are equally modal. In other cases, not one, but two variants may be of the highest frequency. Then there will be two modes, the distribution will be bimodal. Bimodal distributions can indicate the qualitative heterogeneity of the population for the studied trait.

To find the median in a discrete variation series, you need to halve the sum of frequencies and add ½ to the result. So, in the distribution of 185 families by the number of children, the median will be: 185/2 + ½ = 93, i.e. Option 93, which bisects the ordered row. What is the meaning of the 93rd option? In order to find out this, you need to accumulate frequencies, starting from the smallest options. The sum of the frequencies of the 1st and 2nd options is 40. It is clear that there are no 93 options here. If we add the frequency of the 3rd options to 40, then we get the sum equal to 40 + 75 = 115. Therefore, the 93rd option corresponds to the third value of the variable characteristic, and the median will be a family with two children.

Fashion and median in this example coincided. If we had an even sum of frequencies (for example, 184), then, applying the above formula, we get the number of the median variants, 184/2 + ½ = 92.5. Since there are no fractional choices, the result indicates that the median is halfway between 92 and 93 choices.

3. Calculation of the mode and median in the interval variation series

The descriptive nature of the fashion and the median is due to the fact that they do not extinguish individual deviations. They always correspond to a specific option. Therefore, the mode and the median do not require calculations for their finding, if all the values ​​of the feature are known. However, in the interval variation series, to find the approximate value of the mode and median within a certain interval, they resort to calculations.

To calculate a certain value of the modal value of a feature enclosed in an interval, use the formula:

Mo = X Mo + i Mo * (f Mo - f Mo-1) / ((f Mo - f Mo-1) + (f Mo - f Mo + 1)),

Where X Mo is the minimum boundary of the modal interval;

i Mo is the value of the modal interval;

f Mo is the frequency of the modal interval;

f Mo-1 - frequency of the interval preceding the modal;

f Mo + 1 is the frequency of the interval following the modal.

Let us show the calculation of the mode using the example given in Table 2.


Table 2. Distribution of workers of the enterprise according to the fulfillment of production standards

To find a fashion, we first define the modal spacing of a given row. The example shows that the highest frequency corresponds to the interval where the variant lies in the range from 100 to 105. This is the modal interval. The value of the modal interval is 5.

Substituting the numerical values ​​from table 2.in the above formula, we get:

M about = 100 + 5 * (104 -12) / ((104 - 12) + (104 - 98)) = 108.8

The meaning of this formula is as follows: the value of that part of the modal interval that needs to be added to its minimum boundary is determined depending on the value of the frequencies of the preceding and subsequent intervals. In this case, we add 8.8 to 100, i.e. more than half the interval, because the frequency of the previous interval is less than the frequency of the subsequent interval.

Let us now calculate the median. To find the median in the interval variation series, we first determine the interval in which it is located (median interval). Such an interval will be one, the comulative frequency of which is equal to or exceeds half of the sum of the frequencies. The cumulative frequencies are formed by the gradual summation of frequencies, starting from the interval with the smallest value sign. Half of the sum of frequencies we have is equal to 250 (500: 2). Consequently, according to table 3. the median interval will be the interval with the value of wages from 350,000 rubles. up to 400,000 rubles.

Table 3. Calculation of the median in the interval variation series

Before this interval, the sum of the accumulated frequencies was 160. Therefore, to get the median value, it is necessary to add another 90 units (250 - 160).

Basic concepts

For the experimental data obtained from the sample, it is possible to calculate the series numerical characteristics (measures).

Mode is a numerical value that occurs most frequently in a sample. Fashion is sometimes referred to as Mo.

For example, in the series meaning (2 6 6 8 9 9 9 10), the mode is 9, because 9 occurs more often than any other number.

Mode is the most common value (in this example, 9), not the frequency of that value (in this example, 3).

Fashion is found according to the rules

1. In the case when all values ​​in the sample occur equally often, it is generally accepted that this sample series has no mode.

For example, 556677 - there is no fashion in this sample.

2. When two adjacent (adjacent) values ​​have the same frequency and their frequency is greater than the frequencies of any other values, the mode is calculated as the arithmetic mean of these two values.

For example, in the sample 1 2 2 2 5 5 5 6 the frequencies of the adjacent values ​​2 and 5 coincide and equal 3. This frequency is greater than the frequency of the other values ​​1 and 6 (in which it is equal to 1).

Therefore, the mode of this series will be the quantity.

3) If two non-adjacent (not adjacent) values ​​in the sample have equal frequencies that are higher than the frequencies of any other value, then two modes are distinguished. For example, in the series 10 11 11 11 12 13 14 14 14 17, the modes are the values ​​11 and 14. In this case, the sample is said to be bimodal.

There can also be so-called multimodal distributions with more than two vertices (modes)

4) If the mode is estimated by the set of grouped data, then to find the mode it is necessary to determine the group with the highest frequency of the feature. This group is called modal group.

Median - indicated by Me and is defined as a value in relation to which at least 50% of the sample value is less than it and at least 50% is more.

The median is the value that halves an ordered set of data.

Problem 1. Find the median of the sample 9 3 5 8 4 11 13

Solution First, let's sort the sample according to the values ​​included in it. We get, 3 4 5 8 9 11 13. Since there are seven elements in the sample, the fourth element in order will have a value greater than the first three and less than the last three. Thus, the median will be the fourth element - 8

Problem 2. Find the median of the sample 20, 9, 13, 1, 4, 11.

Let us arrange the sample 1, 4, 9, 11, 13, 20 Since there is an even number of elements, there are two “midpoints” - 9 and 13 In this case, the median is determined as the arithmetic mean of these values

Average


The arithmetic mean of a series of n numerical values calculated as

To show the deceitfulness of this indicator, let us give a well-known example: a 60-year-old grandmother with four grandchildren fit in one compartment of a carriage: one - 4 years old, two - 5 years old and one - 6 years old. The arithmetic mean of the age of all passengers in this compartment is 80/5 = 16. In another compartment there is a group of young people: two - 15-year-olds, one - 16-year-old and two - 17-year-olds. The average age of the passengers in this compartment is also 80/5 = 16. Thus, the arithmetic mean of the passengers in these compartments does not differ. But if we turn to the indicator of the standard deviation, it turns out that the average spread relative to the average age in the first case will be 24.6, and in the second case 1.

In addition, the average is quite sensitive to very small or very large values ​​that differ from the main values ​​of the measured characteristics. Let 9 people have an income of 4500 to 5200 thousand dollars per month. Their average income is $ 4,900.If we add to this group a person with an income of $ 20,000 thousand per month, then the average of the entire group shifts and turns out to be equal to $ 6,410, although no one from the entire sample (except one person) actually receives such an amount.

It is clear that a similar shift, but in the opposite direction, can be obtained even if a person with a very low annual income is added to this group.

Sample spread

Spread ( sweep) sampling- the difference between the maximum and minimum values ​​of this particular variation series. It is designated by the letter R.

Swing = maximum value - minimum value

It is clear that the more the measured characteristic varies, the greater the value of R, and vice versa.

However, it may happen that for two sample series both the means and the range coincide, however, the nature of the variation of these series will be different For example, two samples are given

Dispersion

Dispersion is the most commonly used measure of scattering random variable(variable).

Variance - the arithmetic mean of the squares of the deviations of the values ​​of a variable from its mean

Share with your friends or save for yourself:

Loading...