Mean and median are the two basic statistical measures that are commonly used to understand a dataset. Though both are measures of central tendency, they are different in terms of how they calculate an average.
Mean:
The mean, also known as the average, is simply the sum of all the numbers in a dataset divided by the total number of values. It is the most commonly used measure of central tendency and is used to represent the typical value of a dataset. It is important to note that the mean is greatly influenced by extreme values, also known as outliers, and therefore may not be the most accurate representation of the dataset.
For example, if a dataset contains the values 3, 5, 6, 8, and 12, their sum is 34. Dividing 34 by the total number of values, which is 5, gives us the mean of 6.8.
Median:
The median is the middle value of a dataset once it has been sorted in ascending or descending order. It is the value separating the dataset into two halves; half of the values are above the median, and half are below it. The median is not affected by the outliers present in a dataset, as it only depends on the order of the values.
For example, the median of the dataset {2, 5, 8, 10, 12} is 8. If the dataset contains an even number of values, the median can be found by calculating the mean of the two middle values. For example, the median of the dataset {2, 5, 7, 8, 10, 12} is (7+8)/2 = 7.5.
Differences between mean and median:
The main differences between mean and median are their calculations and their sensitivity to the outliers.
1. Calculation:
The mean is calculated by summing all the values in a dataset and dividing the total by the number of values. The median is calculated by sorting the values in ascending or descending order and finding the middle value.
2. Sensitivity to outliers:
The mean is greatly influenced by outliers, as they can significantly affect the sum of the values. This means that if a dataset has a few extreme values, the mean may not be a good representation of the dataset. On the other hand, the median is not affected by outliers, as it only depends on the order of the values. This makes it a more robust measure of central tendency.
For example, imagine a dataset containing the following annual salaries: $40,000, $50,000, $60,000, $70,000, $80,000, $90,000, $100,000, and $1,100,000. The mean salary would be ($40,000+$50,000+$60,000+$70,000+$80,000+$90,000+$100,000+$1,100,000)/9 = $205,556. This means that the mean salary is being greatly inflated by the outlier value of $1,100,000. However, the median salary of this dataset would be $80,000, which represents the typical salary in this dataset.
When to use mean and median:
The choice between mean and median depends on the dataset and what you are trying to analyze. If the dataset contains extreme values or outliers, it is better to use the median, as it is less susceptible to their influence. On the other hand, if the dataset does not have any significant outliers, the mean can be used as an accurate representation of the dataset.
The mean is more suitable for datasets that have a symmetrical distribution, while the median is more suitable for datasets with skewed distributions. A symmetrical distribution has equal frequencies of values on either side of the median, while a skewed distribution has a longer tail on one side than the other.
Conclusion:
In summary, mean and median are two measures of central tendency commonly used in statistics to represent a dataset. While mean is the sum of all values divided by the total number of values, median is the middle value found by sorting the dataset in ascending or descending order. The main difference between them is their sensitivity to outliers; mean is greatly affected by outliers, while median is not. The choice between mean and median depends on the dataset and what you are trying to analyze.