How to calculate the median for the exponential distribution. Median function in excel to perform statistical analysis

The median of the triangle, as well as the height, serves as a graphical parameter that determines the entire triangle, the value of its sides and angles. Three values: medians, heights and bisectors - this is like a barcode on a product, our task is simply to be able to read it.

Definition

The median is the line segment connecting the height and midpoint of the opposite side. The triangle has three vertices, which means there are three medians. Medians do not always coincide with heights or bisectors. Most often these are separate segments.

Median properties

  • The median of an isosceles triangle, drawn to the base, coincides with the height and bisector. V equilateral triangle all medians coincide with bisectors and heights.
  • All medians of the triangle intersect at one point.
  • The median divides a triangle into two equal triangles, and three medians into 6 equal triangles.

Equal-area triangles are called, the areas of which are equal.

Rice. 1. Three medians form 6 equal triangles.

  • The point of intersection of the medians divides them in a ratio of 2: 1, counting from the top.
  • The median drawn to the hypotenuse of a right-angled triangle is half the hypotenuse.

Tasks

All these properties are easy to remember, they are easily fixed in practice. For a better understanding of the topic, we will solve several problems:

  • V right triangle legs are known that are equal to a = 3 and b = 4. Find the value of the median m, drawn to the hypotenuse c.

Rice. 2. Drawing for the problem.

In order to find the value of the median, we need to find the hypotenuse, since the median drawn to the hypotenuse is equal to half of it. Hypotenuse via the Pythagorean theorem: $$ a ^ 2 + b ^ 2 = c ^ 2 $$

$$ c = \ sqrt (a ^ 2 + b ^ 2) = \ sqrt (9 + 16) = \ sqrt (25) = 5 $$

Let's find the median value: $$ m = (c \ over2) = (5 \ over2) = 2.5 $$ - the resulting number is the median value.

The median values ​​in the triangle are not equal. Therefore, it is imperative to imagine exactly what value you need to find.

  • In the triangle, the values ​​of the sides are known: a = 7; b = 8; c = 9. Find the value of the median down to side b.

Rice. 3. Drawing for the problem.

To solve this problem, you need to use one of three formulas to find the median along the sides of the triangle:

$$ m ^ 2 = (1 \ over2) * (a ^ 2 + c ^ 2-b ^ 2) $$

As you can see, the main thing here is to remember the coefficient in the brackets and the signs of the side values. The signs are the easiest to remember - the side to which the median is lowered is always subtracted. In our case, this is b, but it can be any other.

Substitute the values ​​into the formula and find the median value: $$ m = \ sqrt ((1 \ over2) * (a ^ 2 + c ^ 2-b ^ 2)) $$

$$ m = \ sqrt ((1 \ over2) * (49 + 81-64)) = \ sqrt (33) $$ - let's leave the result as a root.

  • In an isosceles triangle, the median drawn to the base is 8, and the base itself is 6. Together with the remaining two, this median divides the triangle into 6 triangles. Find the area of ​​each of them.

The medians divide the triangle into six equal areas. This means that the areas of small triangles will be equal to each other. It is enough to find the area of ​​the larger one and divide it by 6.

Given the median, drawn to the base, in an isosceles triangle, it is the bisector and height. This means that the base and height are known in the triangle. You can find the area.

$$ S = (1 \ over2) * 6 * 8 = 24 $$

Area of ​​each of the small triangles: $$ (24 \ over6) = 4 $$

What have we learned?

We learned what the median is. We determined the properties of the median, and found a solution to typical problems. We talked about basic mistakes and figured out how to easily and quickly memorize the formula for finding the median through the sides of a triangle.

Test by topic

Article rating

Average rating: 4.7. Total ratings received: 87.

Note... V this lesson set out problems in geometry about the median of a triangle. If you need to solve a geometry problem that is not here, write about it in the forum. The course will almost certainly be supplemented.

Task... Find the length of the median of a triangle through its sides

The sides of the triangle are 8, 9 and 13 centimeters long. The median is drawn to the largest side of the triangle. Determine the median of the triangle based on the dimensions of its sides.

Solution.

There are two ways to solve the problem. The first one that teachers dislike high school but is the most versatile.

Method 1.

We apply Stewart's Theorem, according to which the square of the median is equal to one fourth of the sum of the doubled squares of the sides from which the square of the side to which the median is drawn is subtracted.

M c 2 = (2a 2 + 2b 2 - c 2) / 4

Respectively

M c 2 = (2 * 8 2 + 2 * 9 2 - 13 2) / 4
m c 2 = 30.25
m c = 5.5 cm

Method 2.

The second solution, which teachers at school love, is additional constructions of a triangle to a parallelogram and a solution through the parallelogram diagonal theorem.

Let's extend the sides of the triangle and the median by completing them to the parallelogram. In this case, the median BO of triangle ABC will be equal to half of the diagonal of the resulting parallelogram, and the two sides of triangle AB, BC will be equal to its lateral sides. The third side of triangle AC, to which the median was drawn, is the second diagonal of the resulting parallelogram.

According to the theorem, the sum of the squares of the diagonals of a parallelogram is equal to twice the sum of the squares of its sides.

2 (a 2 + b 2) = d 1 2 + d 2 2

Let's denote the diagonal of the parallelogram, which is formed by the continuation of the median of the original triangle as x, we get:

2 (8 2 + 9 2) = 13 2 + x 2
290 = 169 + x 2
x 2 = 290 - 169
x 2 = 121
x = 11

Since the required median is equal to half the diagonal of the parallelogram, the value of the median of the triangle will be 11/2 = 5.5 cm

Answer: 5.5cm

Salaries in various sectors of the economy, temperature and precipitation in the same territory for comparable periods of time, the yield of crops grown in different geographical regions, etc. However, the average is by no means the only generalizing indicator - in some cases for a more accurate assessment a value such as the median is suitable. In statistics, it is widely used as an auxiliary descriptive characteristic of the distribution of a feature in a given population. Let's see how it differs from the average, as well as why it is necessary to use it.

Median in statistics: definition and properties

Imagine the following situation: 10 people work together with the director in the company. Ordinary workers receive UAH 1,000 each, and their manager, who, moreover, is the owner, - UAH 10,000. If we calculate the arithmetic mean, it turns out that the average salary at this enterprise is UAH 1900. Will this statement be true? Or take this example, in the same hospital ward there are nine people with a temperature of 36.6 ° C, and one person whose temperature is 41 ° C. The arithmetic mean in this case is: (36.6 * 9 + 41) / 10 = 37.04 ° C. But this does not mean at all that everyone present is sick. All this suggests that the average alone is often not enough, and that is why the median is used in addition to it. In statistics, this indicator is called a variant that is located exactly in the middle of an ordered variation series. If you calculate it for our examples, you get 1000 UAH, respectively. and 36.6 ° C. In other words, the median in statistics is a value that divides a series in half in such a way that the same number of units of a given population are located on either side of it (up or down). Because of this property, this indicator has several more names: 50th percentile or quantile 0.5.

How to find the median in statistics

The method of calculating this value largely depends on what type of variation series we have: discrete or interval. In the first case, the median in statistics is quite easy to find. All you have to do is find the sum of the frequencies, divide it by 2, and then add ½ to the result. It would be best to explain the principle of the calculation using the following example. Suppose we have grouped fertility data and we want to find out what the median is.

Family group number by number of children

Number of families

Having carried out some simple calculations, we get that the required indicator is equal to: 195/2 + ½ = options. In order to find out what this means, one should sequentially accumulate frequencies, starting with the smallest options. So, the sum of the first two lines gives us 30. Clearly, there are no 98 options. But if you add the frequency of the third option (70) to the result, you get a sum equal to 100. It contains the 98th option, which means that the median will be a family with two children.

As for the interval series, the following formula is usually used here:

М е = X Ме + i Ме * (∑f / 2 - S Me-1) / f Ме, in which:

  • X Me - the first value of the median interval;
  • ∑f is the number of the series (the sum of its frequencies);
  • i Me is the value of the median range;
  • f Me is the frequency of the median range;
  • S Ме-1 - the sum of the cumulative frequencies in the ranges preceding the median one.

Again, it's pretty hard to figure it out without an example. Suppose there is data on the value

Salary, thousand rubles

Accumulated frequencies

To use the above formula, we first need to determine the median interval. As such a range is chosen the one, the accumulated frequency of which exceeds half of the entire sum of frequencies or is equal to it. So, dividing 510 by 2, we find that this criterion corresponds to an interval with a salary value of 250,000 rubles. up to RUB 300,000 Now you can substitute all the data in the formula:

M e = X Me + i Me * (∑f / 2 - S Me-1) / f Me = 250 + 50 * (510/2 - 170) / 115 = 286.96 thousand rubles.

We hope our article was useful, and now you have a clear idea of ​​what the median is in statistics and how it should be calculated.

Along with the average values, the structural averages are calculated as statistical characteristics of the variational distribution series - fashion and median.
Fashion(Mo) is the value of the trait under study, which is repeated with the greatest frequency, i.e. fashion is the most common meaning of a feature.
Median(Me) is the value of a feature falling in the middle of a ranked (ordered) population, i.e. the median is the central value of the variation series.
The main property of the median is that the sum of the absolute deviations of the attribute values ​​from the median is less than from any other value ∑ | x i - Me | = min.

Determination of Mode and Median from Ungrouped Data

Consider determination of mode and median from non-grouped data... Suppose the work teams of 9 people have the following wage categories: 4 3 4 5 3 3 6 2 6. Since this brigade has the most workers of the 3rd category, this tariff category will be modal. Mo = 3.
To determine the median, it is necessary to rank: 2 3 3 3 4 4 5 6 6. The center in this row is the worker of the 4th category, therefore, this category will be the median. If the ranked series includes an even number of units, then the median is determined as the average of the two central values.
If the mode reflects the most common variant of the attribute value, then the median practically performs the functions of the average for a heterogeneous population that does not obey the normal distribution law. Let us illustrate its cognitive significance with the following example.
Suppose we need to characterize the average income of a group of people numbering 100 people, of which 99 have incomes in the range from $ 100 to $ 200 per month, and the monthly income of the latter is $ 50,000 (Table 1).
Table 1 - Monthly incomes of the studied group of people. If we use the arithmetic mean, we get an average income of about $ 600 - $ 700, which has little to do with the income of the main part of the group. The median, in this case, Me = 163 dollars, will allow to give an objective description of the income level of 99% of this group of people.
Consider the determination of the mode and median from grouped data (distribution series).
Suppose the distribution of workers of the entire enterprise as a whole according to the wage category has the following form (Table 2).
Table 2 - Distribution of workers of the enterprise by wage category

Calculation of the mode and median for a discrete series

Calculation of the mode and median for the interval series
Video instruction

Calculation of the mode and median for the variation series
Video instruction

Determination of the mode from a discrete variation series

A previously built series of characteristic values ​​sorted by size is used. If the sample size is odd, take the central value; if the sample size is even, take the arithmetic mean of the two central values.
Determination of the mode from a discrete variation series: the 5th tariff category has the highest frequency (60 people), therefore, it is modal. Mo = 5.
To determine the median value of a feature, the number of the median unit of the series (N Me) is found using the following formula:, where n is the volume of the population.
In our case: .
The resulting fractional value, which always occurs with an even number of population units, indicates that the exact middle is between 95 and 96 workers. It is necessary to determine to which group the workers with these serial numbers belong. This can be done by calculating the accumulated frequencies. There are no workers with these numbers in the first group, where there are only 12 people, and there are no workers in the second group (12 + 48 = 60). The 95th and 96th workers are in the third group (12 + 48 + 56 = 116), therefore, the median is the 4th wage category.

Calculation of the mode and median in the interval series

In contrast to discrete variation series, the determination of the mode and median by interval series requires certain calculations based on the following formulas:
, (6)
where x 0- the lower limit of the modal interval (the interval with the highest frequency is called modal);
i- the value of the modal interval;
f Mo- the frequency of the modal interval;
f Mo -1- the frequency of the interval preceding the modal;
f Mo +1 Is the frequency of the interval following the modal.
(7)
where x 0- the lower border of the median interval (the median is the first interval, the accumulated frequency of which exceeds half of the total frequency sum);
i- the value of the median interval;
S Me -1- accumulated interval preceding the median;
f Me Is the frequency of the median interval.
Let us illustrate the application of these formulas using the data in Table. 3.
The interval with borders 60 - 80 in this distribution will be modal, since it has the highest frequency. Using formula (6), we define the mode:

To establish the median interval, it is necessary to determine the accumulated frequency of each subsequent interval until it exceeds half the sum of the accumulated frequencies (in our case, 50%) (Table 11).
It was found that the median is the interval with the boundaries of 100 - 120 thousand rubles. Let us now determine the median:

Table 3 - Distribution of the population of the Russian Federation by the level of average per capita nominal money income in March 1994.
Groups by level of per capita monthly income, thousand rublesShare of population,%
Up to 201,4
20 – 40 7,5
40 – 60 11,9
60 – 80 12,7
80 – 100 11,7
100 – 120 10,0
120 – 140 8,3
140 –160 6,8
160 – 180 5,5
180 – 200 4,4
200 – 220 3,5
220 – 240 2,9
240 – 260 2,3
260 – 280 1,9
280 – 300 1,5
Over 3007,7
Total100,0

Table 4 - Determination of the median interval
Thus, the arithmetic mean, mode and median can be used as a generalized characteristic of the values ​​of a certain attribute in units of a ranked population.
The main characteristic of the center of distribution is the arithmetic mean, which is characterized by the fact that all deviations from it (positive and negative) in the sum equal zero. It is characteristic for the median that the sum of deviations from it in absolute value is minimal, and the mode is the value of the feature that is most often encountered.
The ratio of the fashion, median and arithmetic mean indicates the nature of the distribution of a feature in the aggregate, allows us to assess its asymmetry. In symmetric distributions, all three characteristics are the same. The greater the discrepancy between the mode and the arithmetic mean, the more asymmetrical the series is. For moderately asymmetric series, the difference between the mode and the arithmetic mean is about three times the difference between the median and the mean, i.e .:
| Mo –`x | = 3 | Me –`x |.

Determination of mode and median by graphical method

The fashion and median in the interval series can be determined graphically... The mode is determined from the distribution histogram. For this, the highest rectangle is selected, which in this case is modal. Then we connect the right vertex of the modal rectangle to the upper right corner of the previous rectangle. And the left vertex of the modal rectangle is with the upper left corner of the subsequent rectangle. From the point of their intersection, we lower the perpendicular to the abscissa axis. The abscissa of the point of intersection of these straight lines will be the distribution mode (Fig. 3).


Rice. 3. Graphic determination of the mode by the histogram.


Rice. 4. Graphical determination of the median by cumulative
To determine the median from a point on the scale of accumulated frequencies (frequencies) corresponding to 50%, a straight line is drawn parallel to the abscissa axis until it intersects with the cumulative. Then, from the point of intersection, a perpendicular is lowered onto the abscissa axis. The abscissa of the intersection point is the median.

Quartiles, deciles, percentiles

Similarly, with finding the median in the variational distribution series, you can find the value of the feature for any unit of the ranked series in the order of magnitude. So, for example, you can find the value of a feature in units dividing a series into four equal parts, into 10 or 100 parts. These values ​​are called "quartiles", "deciles", "percentiles".
Quartiles are a feature value that divides a ranked population into 4 equal parts.
Distinguish the lower quartile (Q 1), separating ¼ part of the population with smallest values character, and the upper quartile (Q 3), intersecting ¼ part with highest values sign. This means that 25% of the units of the population will be less in terms of Q 1; 25% of the units will be enclosed between Q 1 and Q 2; 25% is between Q 2 and Q 3, and the remaining 25% exceeds Q 3. The mean quartile of Q 2 is the median.
To calculate quartiles for an interval variation series, the following formulas are used:
, ,
where x Q 1- the lower boundary of the interval containing the lower quartile (the interval is determined by the accumulated frequency, the first exceeding 25%);
x Q 3- the lower boundary of the interval containing the upper quartile (the interval is determined by the accumulated frequency, the first exceeding 75%);
i- the size of the interval;
S Q 1-1- cumulative frequency of the interval preceding the interval containing the lower quartile;
S Q 3-1- cumulative frequency of the interval preceding the interval containing the upper quartile;
f Q 1- the frequency of the interval containing the lower quartile;
f Q 3 Is the frequency of the interval containing the upper quartile.
Consider the calculation of the lower and upper quartiles according to the table. 10. The lower quartile is in the 60 - 80 range, the cumulative frequency of which is 33.5%. The upper quartile is in the 160 - 180 range with a cumulative frequency of 75.8%. With this in mind, we get:
,
.
In addition to quartiles, deciles can be determined in variational distributions - variants dividing the ranked variation range by ten equal parts... The first decile (d 1) divides the population in a ratio of 1/10 to 9/10, the second decile (d 1) in a ratio of 2/10 to 8/10, etc.
They are calculated by the formulas:
, .
Feature values ​​that divide a row into one hundred parts are called percentiles. The ratios of the median, quartiles, deciles and percentiles are shown in Fig. 5.

The central trend of the data can be viewed not only as a value with zero total deviation (arithmetic mean) or maximum frequency (mode), but also as some mark (value in the aggregate), dividing the ranked data (sorted in ascending or descending order) into two equal parts ... Half of the original data is less than this mark, and half is more. That's what it is median.

So, the median in statistics is the level of the indicator that divides the dataset into two equal halves. Values ​​are lower in one half and higher than the median in the other. As an example, let's look at a set of random numbers.

Obviously, with a symmetric distribution, the middle dividing the population in half will be in the very center - in the same place as the arithmetic mean (and mode). This is, so to speak, an ideal situation when the mode, median and arithmetic mean coincide and all their properties fall on one point - maximum frequency, halving, zero sum of deviations - all in one place. However, life is not as symmetrical as normal distribution.

Suppose we are dealing with technical measurements of deviations from the expected value of something (content of elements, distance, level, mass, etc., etc.). If everything is OK, then the deviations will most likely be distributed according to a law close to normal, approximately as in the figure above. But if an important and uncontrollable factor is present in the process, then abnormal values ​​may appear, which will significantly affect the arithmetic mean, but at the same time they will hardly affect the median.

The sample median is an alternative to the arithmetic mean, because it is resistant to abnormal deviations (outliers).

Mathematical the median property is that the sum of the absolute (in absolute value) deviations from the median value gives the minimum possible value when compared with deviations from any other value. Even less than the arithmetic mean, oh how! This fact finds its application, for example, in solving transport problems, when it is necessary to calculate the place of construction of objects near the road in such a way that the total length of flights to it from different places is minimal (stops, gas stations, warehouses, etc., etc. .).

The median formula in statistics for discrete data is somewhat reminiscent of a fashion formula. Namely, the fact that there is no formula as such. The median value is selected from the available data and only if this is not possible, a simple calculation is carried out.

First of all, the data is ranked (sorted in descending order). Then there are two options. If the number of values ​​is odd, then the median will correspond to the central value of the series, the number of which can be determined by the formula:

No. Me- the number of the value corresponding to the median,

N- the number of values ​​in the data set.

Then the median is denoted as

This is the first option where there is one central value in the data. The second option occurs when the amount of data is even, that is, instead of one, there are two central values. The way out is simple: the arithmetic mean of two central values ​​is taken:

V interval data it is not possible to select a specific value. The median is calculated according to a certain rule.

To begin with (after ranking the data), find median interval... This is the interval through which the desired median value passes. It is determined using the cumulative proportion of ranked intervals. Where the accumulated share for the first time exceeded 50% of all values, there is also a median interval.

I do not know who came up with the formula for the median, but we clearly proceeded from the assumption that the distribution of data within the median interval is uniform (i.e. 30% of the width of the interval is 30% of the values, 80% of the width is 80% of the values, etc.) ... Hence, knowing the number of values ​​from the beginning of the median interval to 50% of all values ​​of the population (the difference between half of the number of all values ​​and the accumulated frequency of the premedian interval), we can find what proportion they occupy in the entire median interval. This fraction is exactly transferred to the width of the median interval, indicating a specific value, which is later called the median.

Let's turn to a pictorial diagram.

It turned out a little cumbersome, but now, I hope, everything is clear and understandable. In order not to draw such a graph every time when calculating, you can use a ready-made formula. The formula for the median is as follows:

where x Me- the lower border of the median interval;

i me- the width of the median interval;

∑f / 2- the number of all values ​​divided by 2 (two);

S (Me-1)- the total number of observations that was accumulated before the beginning of the median interval, i.e. cumulative frequency of the pre-median interval;

f Me- the number of observations in the median interval.

As it is easy to see, the median formula consists of two terms: 1 - the value of the beginning of the median interval and 2 - the very part that is proportional to the missing accumulated share up to 50%.

For example, let's calculate the median from the following data.

It is required to find the median price, that is, the price that is cheaper and more expensive for half of the quantity of goods. To begin with, let's make auxiliary calculations of the accumulated frequency, the accumulated share, and the total number of goods.

According to the last column “Accumulated share”, we determine the median interval - 300-400 rubles (the accumulated share for the first time is more than 50%). The width of the interval is 100 rubles. Now all that remains is to plug the data into the above formula and calculate the median.

That is, for one half of the goods the price is lower than 350 rubles, for the other half - higher. It's simple. The arithmetic mean, calculated from the same data, is 355 rubles. The difference is not significant, but it is there.

Calculating the median in Excel

The median for numeric data is easy to find using Excel function, which is called so - MEDIAN... Interval data is another matter. There is no corresponding function in Excel. Therefore, you need to use the above formula. What can you do? But this is not very tragic, since the calculation of the median from interval data is a rare case. You can also count on a calculator once.

Finally, I propose a problem. There is a dataset. 15, 5, 20, 5, 10. What is the average? Four options:

The mode, median, and mean of a sample are different ways to determine the central trend in a sample.

Share with friends or save for yourself:

Loading...