Class 11 Mathematics Notes Chapter 15 (Chapter 15) – Examplar Problems (English) Book
Detailed Notes with MCQs of Chapter 15, Statistics, from your NCERT Exemplar. This chapter is crucial, not just for your Class 11 exams, but also forms the foundation for many quantitative sections in government exams. We'll delve into the concept of dispersion, which measures how spread out our data is, going beyond just the central tendency (mean, median, mode).
Chapter 15: Statistics - Detailed Notes for Exam Preparation
1. Introduction to Measures of Dispersion
- Why Dispersion? Measures of central tendency (mean, median, mode) give us a single value representing the center of the data, but they don't tell us about the data's spread or variability. Two datasets can have the same mean but vastly different distributions.
- Example: Scores of two batsmen:
- Batsman A: 45, 50, 55 (Mean = 50)
- Batsman B: 0, 50, 100 (Mean = 50)
Both have the same mean, but Batsman A is more consistent, while Batsman B's scores are more spread out (higher dispersion).
- Example: Scores of two batsmen:
- Dispersion: It measures the extent to which the values in a distribution differ from the average or central value. Key measures include Range, Quartile Deviation, Mean Deviation, and Standard Deviation (along with Variance).
2. Range
- Definition: The simplest measure of dispersion. It's the difference between the largest (L) and smallest (S) observation in the dataset.
- Formula: Range = L - S
- Merits: Easy to understand and calculate.
- Demerits:
- Highly affected by extreme values (outliers).
- Doesn't consider the distribution of values between the extremes.
- Cannot be calculated for open-ended frequency distributions.
3. Quartile Deviation (Q.D.) or Semi-Interquartile Range
- Concept: Based on the upper quartile (Q3) and the lower quartile (Q1). Quartiles divide the data into four equal parts. Q1 is the value below which 25% of data lies, and Q3 is the value below which 75% of data lies. The difference (Q3 - Q1) is the Interquartile Range.
- Formula: Quartile Deviation (Q.D.) = (\frac{Q_3 - Q_1}{2})
- Calculation of Quartiles:
- Ungrouped Data: Arrange data in ascending order.
- Q1 = Value of ((\frac{n+1}{4})^{th}) item.
- Q3 = Value of ((\frac{3(n+1)}{4})^{th}) item.
- Grouped Data (Continuous):
- Find the class containing Q1 (where cumulative frequency just exceeds N/4) and Q3 (where cumulative frequency just exceeds 3N/4).
- Use the formula: (Q_k = l + \frac{(\frac{kN}{4} - C)}{f} \times h)
- k = 1 for Q1, k = 3 for Q3
- l = lower limit of the quartile class
- N = Total frequency ((\sum f))
- C = Cumulative frequency of the class preceding the quartile class
- f = Frequency of the quartile class
- h = Class width
- Ungrouped Data: Arrange data in ascending order.
- Coefficient of Quartile Deviation: (\frac{Q_3 - Q_1}{Q_3 + Q_1}) (Used for comparing dispersion in different datasets).
- Merits:
- Not affected by extreme values.
- Better than range as it uses the middle 50% of data.
- Can be calculated for open-ended distributions.
- Demerits:
- Ignores 50% of the data (first 25% and last 25%).
- Not based on all observations.
4. Mean Deviation (M.D.)
- Concept: The arithmetic mean of the absolute deviations of the observations from a measure of central tendency (mean, median, or mode). It tells us, on average, how far the observations are from the center.
- Formulas:
- Ungrouped Data:
- M.D. about Mean ((\bar{x})): (MD(\bar{x}) = \frac{1}{n} \sum_{i=1}^{n} |x_i - \bar{x}|)
- M.D. about Median (M): (MD(M) = \frac{1}{n} \sum_{i=1}^{n} |x_i - M|)
- Grouped Data (Discrete/Continuous): (N = (\sum f_i))
- M.D. about Mean ((\bar{x})): (MD(\bar{x}) = \frac{1}{N} \sum_{i=1}^{k} f_i |x_i - \bar{x}|) (where (x_i) are observations or mid-points of classes)
- M.D. about Median (M): (MD(M) = \frac{1}{N} \sum_{i=1}^{k} f_i |x_i - M|)
- Ungrouped Data:
- Important Note: Mean deviation is minimum when calculated about the Median.
- Coefficient of Mean Deviation:
- About Mean: (\frac{MD(\bar{x})}{\bar{x}})
- About Median: (\frac{MD(M)}{M})
- Merits:
- Based on all observations.
- Less affected by extreme values compared to Standard Deviation.
- Demerits:
- Ignoring the signs of deviations ((|x_i - \bar{x}|)) makes it mathematically inconvenient for further algebraic treatment.
5. Variance ((\sigma^2)) and Standard Deviation ((\sigma))
- Concept: The most widely used measures of dispersion. Standard Deviation is the positive square root of the variance. Variance is the mean of the squared deviations from the arithmetic mean. Squaring deviations overcomes the issue of signs summing to zero and gives more weight to larger deviations.
- Formulas:
- Ungrouped Data:
- Variance ((\sigma^2)): (\frac{1}{n} \sum (x_i - \bar{x})^2 = \frac{1}{n} \sum x_i^2 - (\bar{x})^2 = \frac{\sum x_i^2}{n} - (\frac{\sum x_i}{n})^2)
- S.D. ((\sigma)): (\sqrt{\text{Variance}})
- Grouped Data (Discrete/Continuous): (N = (\sum f_i))
- Variance ((\sigma^2)): (\frac{1}{N} \sum f_i (x_i - \bar{x})^2 = \frac{1}{N} \sum f_i x_i^2 - (\bar{x})^2 = \frac{\sum f_i x_i^2}{N} - (\frac{\sum f_i x_i}{N})^2)
- S.D. ((\sigma)): (\sqrt{\text{Variance}})
- Ungrouped Data:
- Shortcut/Step-Deviation Method for Variance (Very useful for calculations):
- Let (d_i = x_i - A) (A = assumed mean) or (u_i = \frac{x_i - A}{h}) (h = class width, if applicable)
- Variance using deviations ((d_i)): (\sigma^2 = \frac{\sum f_i d_i^2}{N} - (\frac{\sum f_i d_i}{N})^2)
- Variance using step-deviations ((u_i)): (\sigma^2 = h^2 \left[ \frac{\sum f_i u_i^2}{N} - (\frac{\sum f_i u_i}{N})^2 \right])
- Remember: (\sigma = \sqrt{\sigma^2})
- Properties of Variance and Standard Deviation:
- S.D. is always non-negative ((\sigma \ge 0)).
- If all observations are equal, S.D. = 0.
- Change of Origin: Variance and S.D. are independent of change of origin. If (y_i = x_i + a), then (\sigma_y^2 = \sigma_x^2) and (\sigma_y = \sigma_x). Adding/subtracting a constant doesn't change the spread.
- Change of Scale: Variance and S.D. depend on change of scale. If (y_i = b x_i), then (\sigma_y^2 = b^2 \sigma_x^2) and (\sigma_y = |b| \sigma_x). Multiplying/dividing by a constant scales the spread.
- If (y_i = a + b x_i), then (\sigma_y = |b| \sigma_x) and (\sigma_y^2 = b^2 \sigma_x^2).
- Merits:
- Based on all observations.
- Mathematically tractable and used extensively in statistics.
- Least affected by sampling fluctuations (compared to other measures).
- Demerits:
- Affected by extreme values (due to squaring).
- More complex to calculate than Range or Q.D.
6. Analysis of Frequency Distributions - Coefficient of Variation (C.V.)
- Concept: Standard Deviation is an absolute measure of dispersion (has the same units as the data). To compare the variability or consistency of two or more datasets with different units or different means, we need a relative measure. C.V. is such a measure.
- Formula: (C.V. = \frac{\sigma}{\bar{x}} \times 100) (where (\bar{x}) is the mean and (\sigma) is the standard deviation). It's expressed as a percentage.
- Interpretation:
- A distribution with a smaller C.V. is considered more consistent or stable (less variable).
- A distribution with a larger C.V. is considered less consistent or more variable.
- Use Case: Comparing the consistency of batsmen, stability of prices, variability in yields, etc.
Key Takeaways for Government Exams:
- Formulas are vital: Memorize formulas for Mean, Median, Mode, Q1, Q3, MD, Variance, SD, and CV for both ungrouped and grouped data. Pay special attention to the shortcut methods for Variance/SD.
- Properties: Understand the effect of change of origin and scale on Mean, Variance, and SD. This is a frequent source of MCQs.
- Comparison: Know when to use which measure. CV is specifically for comparing variability between different datasets.
- Calculations: Practice quick calculations, especially for grouped data using step-deviation.
- Conceptual Clarity: Understand why we measure dispersion and the strengths/weaknesses of each measure.
Multiple Choice Questions (MCQs)
Here are 10 MCQs based on the concepts discussed. Try to solve them yourself first.
-
Which measure of dispersion is most affected by extreme values?
(a) Range
(b) Quartile Deviation
(c) Mean Deviation
(d) Standard Deviation -
If the variance of a dataset (x_1, x_2, ..., x_n) is (\sigma^2), what is the variance of the dataset (2x_1+3, 2x_2+3, ..., 2x_n+3)?
(a) (\sigma^2)
(b) (2\sigma^2)
(c) (4\sigma^2)
(d) (4\sigma^2 + 3) -
The measure of dispersion which is independent of the units of measurement of the observations is:
(a) Range
(b) Standard Deviation
(c) Variance
(d) Coefficient of Variation -
For a set of observations, the mean deviation is calculated from the median. If the same deviations were calculated from the mean, the value of mean deviation would be:
(a) Always smaller
(b) Always greater or equal
(c) Always equal
(d) Cannot be determined -
What is the standard deviation of the first 5 natural numbers (1, 2, 3, 4, 5)?
(a) 2
(b) (\sqrt{2})
(c) 3
(d) (\sqrt{3}) -
If the standard deviation of a set of observations is 4, what is its variance?
(a) 2
(b) 4
(c) 8
(d) 16 -
Two factories A and B have the following details regarding wages (in Rs.) of workers:
Factory Mean Wage Standard Deviation A 5000 100 B 6000 150 Which factory has greater variability in wages? (a) Factory A (b) Factory B (c) Both have equal variability (d) Cannot be determined -
Quartile Deviation is based on:
(a) All observations
(b) The lowest 50% of observations
(c) The highest 50% of observations
(d) The middle 50% of observations -
The formula (\sigma^2 = h^2 \left[ \frac{\sum f_i u_i^2}{N} - (\frac{\sum f_i u_i}{N})^2 \right]) is used to calculate variance using:
(a) Direct method
(b) Short-cut method (using assumed mean)
(c) Step-deviation method
(d) Mean deviation method -
If each observation in a dataset is decreased by 5, the standard deviation of the new dataset:
(a) Decreases by 5
(b) Increases by 5
(c) Becomes 1/5th of the original
(d) Remains unchanged
Answer Key for MCQs:
- (a) Range (It directly uses the maximum and minimum values)
- (c) (4\sigma^2) (Variance is independent of change of origin (+3) but depends on the square of the change of scale ((2^2=4)). (\sigma_{new}^2 = 2^2 \sigma_{old}^2 = 4\sigma^2))
- (d) Coefficient of Variation (It's a ratio, making it unitless)
- (b) Always greater or equal (Mean deviation is minimum when calculated from the median)
- (b) (\sqrt{2}) (Mean = 3. Variance = (\frac{1}{5}[(1-3)^2 + (2-3)^2 + (3-3)^2 + (4-3)^2 + (5-3)^2] = \frac{1}{5}[4+1+0+1+4] = \frac{10}{5} = 2). SD = (\sqrt{2}))
- (d) 16 (Variance = ((\text{Standard Deviation})^2 = 4^2 = 16))
- (b) Factory B (Calculate C.V. for both: CV(A) = (100/5000)*100 = 2%. CV(B) = (150/6000)*100 = 2.5%. Since CV(B) > CV(A), Factory B has greater variability)
- (d) The middle 50% of observations (It uses Q3 and Q1, which define the boundaries of the middle 50%)
- (c) Step-deviation method (The use of (u_i = (x_i - A)/h) and multiplying by (h^2) is characteristic of the step-deviation method)
- (d) Remains unchanged (Standard deviation is independent of change of origin, i.e., adding or subtracting a constant)
Study these notes thoroughly, practice the problems from the Exemplar, and pay close attention to the properties and formulas. Good luck with your preparation!