Class 11 Statistics Notes Chapter 3 (Organisation of data) – Statistics For Economics Book

Statistics For Economics
Detailed Notes with MCQs of Chapter 3: Organisation of Data from your Statistics for Economics book. This chapter is crucial because raw data, as collected, is often chaotic and difficult to interpret. Organizing it properly is the first step towards meaningful analysis, which is frequently tested in government exams.

Chapter 3: Organisation of Data - Detailed Notes

1. Introduction: Why Organise Data?

  • Data collected in its original form is called Raw Data.
  • Raw data is usually large, disorganized, and difficult to comprehend, compare, or analyze.
  • Organisation of Data refers to the systematic arrangement of collected data, making it more meaningful, concise, and easier to understand and interpret.
  • It facilitates comparison and highlights the main characteristics of the data.

2. Raw Data

  • Data collected directly from the source without any statistical treatment.
  • Example: Marks obtained by 50 students in a test listed randomly as they were collected.
  • Limitations: Difficult to grasp trends, find highest/lowest values quickly, or understand the distribution.

3. Classification of Data

  • Definition: The process of grouping data into different categories or classes based on some common characteristics. It brings order to raw data.
  • Objectives/Purpose of Classification:
    • Simplification & Brevity: Reduces complexity and presents data concisely.
    • Utility: Enhances the usefulness of data by making it understandable.
    • Distinctiveness: Clearly separates data based on characteristics.
    • Comparability: Facilitates comparison between different groups or datasets.
    • Scientific Arrangement: Arranges data logically and systematically.
    • Basis for Tabulation & Analysis: Forms the foundation for further statistical treatment like tabulation, presentation, and analysis.
  • Characteristics of a Good Classification:
    • Comprehensiveness: Every item of the data must belong to one of the classes. No item should be left out.
    • Clarity & Unambiguity: The definition of each class must be clear and precise, avoiding any confusion.
    • Mutual Exclusivity: Classes should not overlap; an item should belong to only one class.
    • Homogeneity: Items within a particular class should be as similar (homogeneous) as possible with respect to the characteristic being used for classification.
    • Suitability: The classification should suit the objectives of the inquiry.
    • Stability: The basis of classification, once decided, should generally remain the same throughout the analysis.
    • Flexibility/Elasticity: Should be capable of accommodating new situations or data points without altering the basic structure drastically.

4. Basis of Classification

Data can be classified based on different criteria:

  • (a) Geographical Classification (Spatial Classification):
    • Data is classified based on geographical location or region (e.g., country, state, district, city, rural/urban).
    • Example: Production of wheat in different states of India (Punjab, Haryana, UP, etc.).
  • (b) Chronological Classification (Temporal Classification):
    • Data is classified based on time (e.g., years, months, weeks, days, hours). Data is arranged in ascending or descending order of time.
    • Example: Population of India from 1951 to 2011; monthly sales of a company.
  • (c) Qualitative Classification (Classification by Attributes):
    • Data is classified based on characteristics or attributes that cannot be measured numerically (e.g., sex, religion, literacy, honesty, beauty, occupation).
    • Types:
      • Simple Classification (Dichotomous): Classification based on the presence or absence of one attribute. Data is divided into two groups. Example: Population classified as Male/Female or Literate/Illiterate.
      • Manifold Classification: Classification based on more than one attribute simultaneously. Example: Population classified by sex (Male/Female) and literacy (Literate/Illiterate), resulting in four groups (Male Literate, Male Illiterate, Female Literate, Female Illiterate).
  • (d) Quantitative Classification (Numerical Classification):
    • Data is classified based on characteristics that can be measured numerically (e.g., height, weight, age, income, marks, production).
    • Data is grouped into classes or ranges.
    • Example: Marks obtained by students grouped into classes like 0-10, 10-20, 20-30, etc. This forms the basis for Frequency Distributions.

5. Variables: Discrete and Continuous

  • Variable: A characteristic or phenomenon which is capable of being measured and changes its value over time or across individuals/objects.
  • Types of Variables:
    • Discrete Variable: Variables that can only take specific, exact values (usually integers) and cannot take intermediate values. There are distinct gaps between values.
      • Examples: Number of students in a class, number of cars in a parking lot, number of printing mistakes per page. (You can't have 2.5 students).
    • Continuous Variable: Variables that can take any value within a given range, including fractional or decimal values.
      • Examples: Height, weight, temperature, time, income. (Height can be 165.5 cm, weight can be 60.7 kg).
      • Note: Even if measured in discrete units (like age in completed years), if the underlying characteristic is continuous, it's often treated as continuous for statistical purposes.

6. Frequency Distribution

  • Definition: A systematic presentation of data that summarizes the frequency (number of times) each value or class of values occurs in a dataset. It's a common way to organize quantitative data.
  • Key Terms:
    • Frequency: The number of times a particular value or an item within a specific class interval repeats itself in the dataset.
    • Class: A range or group into which quantitative data is divided (e.g., 10-20, 20-30).
    • Class Limits: The two endpoints of a class interval.
      • Lower Class Limit (L1): The smallest value in a class.
      • Upper Class Limit (L2): The highest value in a class.
    • Class Interval (i or h or c): The difference between the Upper Class Limit and the Lower Class Limit (L2 - L1). Also called class magnitude or class width.
    • Class Mid-point or Class Mark: The central value of a class interval. Calculated as: (Upper Class Limit + Lower Class Limit) / 2 or (L1 + L2) / 2.
    • Range: The difference between the highest and lowest observed values in the entire dataset. (Range = Highest Value - Lowest Value). Helps decide the number and size of classes.
    • Tally Marks: A method used during manual classification to count the frequency of observations falling into each class. Usually represented as vertical bars (|), with every fifth observation marked by a diagonal line across the previous four (||||).
  • Constructing a Frequency Distribution:
    1. Determine the Range of the data.
    2. Decide the number of classes (usually between 5 and 15). Too few or too many classes can obscure the pattern. (Sturges' rule can be a guide, but isn't strictly necessary for this level).
    3. Determine the Class Interval (Approximate Interval = Range / Number of Classes). Adjust for convenient values.
    4. Decide the starting point (Lower limit of the first class).
    5. Determine the class limits for all classes. Ensure they are continuous if using exclusive series.
    6. Distribute the data into appropriate classes using Tally Marks.
    7. Count the tally marks to find the frequency for each class.
    8. Sum the frequencies; it should equal the total number of observations.
  • Types of Frequency Distributions (Series):
    • Exclusive Series: The upper limit of one class is the lower limit of the next class (e.g., 0-10, 10-20, 20-30). An observation exactly equal to the upper limit is included in the next class (e.g., 10 falls in the 10-20 class, not 0-10). This is preferred for continuous variables.
    • Inclusive Series: The upper limit of one class is not equal to the lower limit of the next class; there's a gap (e.g., 0-9, 10-19, 20-29). An observation equal to either limit belongs to that class itself. Used often for discrete data, but can be used for continuous too.
      • Conversion from Inclusive to Exclusive: Find the difference between the upper limit of one class and the lower limit of the next class. Divide this difference by 2 (this is the 'correction factor'). Subtract the correction factor from all lower limits and add it to all upper limits. (e.g., for 0-9, 10-19: Difference = 10-9 = 1. Correction factor = 1/2 = 0.5. New series: -0.5 - 9.5, 9.5 - 19.5, 19.5 - 29.5...).
    • Open-End Classes: A distribution where the lower limit of the first class and/or the upper limit of the last class are not specified (e.g., "Below 10", "50 and Above"). This makes calculating range or mid-points for these classes difficult.
    • Cumulative Frequency Series: Shows the cumulative frequency (sum of frequencies) up to a certain point.
      • 'Less than' Cumulative Frequency: The sum of frequencies of all classes up to and including the upper limit of a particular class. Starts from the top.
      • 'More than' Cumulative Frequency: The sum of frequencies of all classes from the lower limit of a particular class to the end. Starts from the bottom (or total frequency).
    • Frequency Array: A simple series showing individual discrete values and their corresponding frequencies (used for discrete variables). Example: Number of children per family (0, 1, 2, 3...) and the number of families for each.
    • Bivariate Frequency Distribution: A distribution showing the frequencies of two variables simultaneously (e.g., classifying students based on both height and weight groups). Presented in a two-way table.

7. Loss of Information

  • Classification and summarization into frequency distributions inevitably lead to a loss of information.
  • We gain a clear picture of the overall pattern, but the individual values within each class are lost. We only know how many observations fall within a range, not their exact values.
  • This is a necessary trade-off for simplification and analysis.

Multiple Choice Questions (MCQs)

Here are 10 MCQs based on Chapter 3 for your practice:

  1. The primary purpose of classifying data is to:
    a) Collect data accurately
    b) Eliminate errors in data
    c) Arrange data systematically for analysis
    d) Increase the volume of data

  2. Raw data refers to:
    a) Data presented in graphs
    b) Data arranged in ascending order
    c) Data in its original, unorganized form
    d) Data collected only from primary sources

  3. Classification of data based on location (like states or cities) is called:
    a) Chronological Classification
    b) Qualitative Classification
    c) Quantitative Classification
    d) Geographical Classification

  4. A variable that can take any value within a given range, including decimals, is known as a:
    a) Discrete variable
    b) Continuous variable
    c) Qualitative variable
    d) Dependent variable

  5. In an exclusive type frequency distribution, the upper limit of a class interval is:
    a) Included in the same class interval
    b) Excluded from the same class interval
    c) Always equal to the lower limit
    d) Included only if it's an integer

  6. The difference between the upper and lower limit of a class is called:
    a) Class Mark
    b) Class Frequency
    c) Class Interval (Magnitude)
    d) Range

  7. Classification based on attributes like 'literacy' or 'gender' is an example of:
    a) Quantitative Classification
    b) Chronological Classification
    c) Qualitative Classification
    d) Geographical Classification

  8. The value that lies exactly halfway between the lower and upper limits of a class is the:
    a) Frequency
    b) Class Interval
    c) Class Mark (Mid-point)
    d) Cumulative Frequency

  9. A frequency distribution where the lower limit of the first class or the upper limit of the last class is not defined is called:
    a) Exclusive Series
    b) Inclusive Series
    c) Open-End Class Distribution
    d) Bivariate Distribution

  10. Which of the following represents a loss incurred during the process of classifying data into a frequency distribution?
    a) Loss of time
    b) Loss of individual observation values
    c) Loss of total frequency
    d) Loss of data reliability


Answer Key for MCQs:

  1. c
  2. c
  3. d
  4. b
  5. b
  6. c
  7. c
  8. c
  9. c
  10. b

Study these notes thoroughly. Understanding the concepts of classification and frequency distribution is fundamental before moving on to tabulation, presentation, and analysis of data. Let me know if any part needs further clarification!

Read more