Difference Between Median And Average

tl;dr
Median is the middle number of a dataset when arranged in order from smallest to largest, while mean is the average of all observations in a dataset, calculated by adding all of the observations and dividing it by the number of observations.

Difference Between Median And Average

When it comes to analyzing data, there are two commonly used methods- calculating the median and calculating the average. Though both of these terms may seem interchangeable, they have a significant difference. In this article, we will discuss the difference between median and average.

Mean vs. Median

The term ‘Average’ usually refers to the mean, which is calculated by adding all of the observations in a dataset and then dividing it by the number of observations. The mean is the most widely used method for calculating central tendency because it takes into account all the observations. For instance, if we want to calculate the average height of people in a room, we will add the height of all individuals and divide it by the total number of people.

The Median, on the other hand, is the central value in the dataset. It is the number that lies in the middle of the data when it is arranged in order from smallest to largest. So, if we have a dataset of 5, 7, 9, 14, and 21, then the median will be 9 as it lies in the center of the dataset. It is important to note that the median may not necessarily be part of the dataset or unique.

Mean and median are both used to measure central tendency or the typical value in a dataset. However, they use different methods to do so. The mean is determined by adding all the observations and dividing it by the number of observations. And the median is the middle number of a dataset.

When to Use Median

The median is often used when the dataset contains extreme outliers that may significantly impact the value of the mean. For example, consider a dataset of the income of 10 individuals in a small town. If we add the income of all ten individuals and divide it by ten, we will get the mean income of people in the town. However, If that dataset contains an extremely wealthy business owner whose income is significantly higher than the others, then the mean would be skewed upwards and not accurately represent the typical income of the town’s residents. In such cases, we use the median instead of the mean.

Additionally, the median is used for datasets that have a skewed distribution. For example, if we have a dataset for the time taken to complete a year's worth of work, with most people taking 1 year but a few people taking 2 or 3 years, then the mean may not represent how long it takes to complete a year's worth of work. The median would be more representative of the typical time taken to complete a year's worth of work in this scenario.

When to Use Mean

The mean is often used for datasets that have a roughly symmetrical distribution. For instance, if we have a dataset of grades of a class with no outliers, the mean can be used to find the typical grade that students are earning. Additionally, if we have a dataset without missing values, then the mean is the most commonly used measure of central tendency.

Another factor to consider when deciding whether mean or median should be used is the level of measurement of data. Mean is most appropriate when the data is continuous in nature, such as temperature, weight, and height. On the other hand, median is more suitable when the data is measured on an ordinal scale, which means it has an order but not necessarily the same distance between values. Examples of ordinal data include ranks and grades.

Conclusion

In conclusion, both median and mean are two commonly used measures of central tendency to find the typical value from a dataset. While the mean is more commonly used than the median, the use of the median is more appropriate when the dataset is skewed or contains extreme outliers. The usage of these methods depends on the attributes of the dataset, level of measurement, and what is being analyzed. Hence, it is important to understand the difference between these two terms and their applicability in data analysis.