Unit I : Descriptive Statistics

Meaning of Statistics

Statistics is a branch of mathematics that deals with the collection, organization, analysis, interpretation, and presentation of numerical data. It helps in understanding large volumes of data and making informed decisions based on patterns and trends.

In simple terms, statistics helps transform raw data into meaningful information to solve real-world problems and support decision-making in fields like business, economics, healthcare, and social sciences.

Scope of Statistics

The scope of statistics is broad, covering multiple fields and functions where data-driven decisions are essential. Below are the key areas:

Types of Statistics:

1. Descriptive Statistics

Descriptive statistics focus on summarizing and presenting data in a meaningful way without making conclusions beyond the available data.

Key Techniques:

Measures of Central Tendency: Mean, Median, Mode
Measures of Dispersion: Range, Variance, Standard Deviation
Data Visualization: Tables, Bar Graphs, Pie Charts, Histograms

Applications:

Analyzing customer demographics for a marketing campaign
Reporting average sales figures for a specific period

2. Inferential Statistics

Inferential statistics involve making predictions, inferences, or generalizations about a population based on sample data.

Key Techniques:

Hypothesis Testing: Tests whether a claim about a population parameter is true
Confidence Intervals: Estimates the range in which a population parameter lies
Regression Analysis: Understands the relationship between variables
Probability Distributions: Normal and Binomial distributions

Applications:

Forecasting market trends based on survey results
Determining the effectiveness of a new product using sample feedback

Functions of Statistics

Limitations of Statistics

Measures of Central tendency – Mean, Median, Mode

1. Mean (Arithmetic Mean)

The mean is the average of a dataset, calculated by dividing the sum of all data values by the number of observations.

Formula:

M e a n = \frac{\sum X}{N​}

Where:

$X$ = Data values
$N$ = Number of observations

Types of Mean:

Simple Mean: For ungrouped data
Weighted Mean: When different values have different weights

Example:

Dataset: {12, 15, 18, 20, 25}

M e a n = \frac{12 + 15 + 18 + 20 + 25}{5} = 18

Advantages:

Easy to calculate
Considers all data points

Disadvantages:

Sensitive to extreme values (outliers)

2. Median

The median is the middle value when the data is arranged in ascending or descending order.

Calculation:

Arrange the data in order.
If the number of observations is odd, the median is the middle value.
If the number of observations is even, the median is the average of the two middle values.

Formula for Even Data:

M e d i a n = \frac{X_{(N / 2)} + X_{(N / 2 + 1)}}{2​}

Example:

Dataset: {10, 15, 20, 25, 30}
Median = 20 (middle value)

For Even Dataset: {10, 15, 20, 25}
Median = $(15 + 20) / 2 = 17.5$

Advantages:

Not affected by outliers
Suitable for skewed distributions

Disadvantages:

Ignores extreme data points
Less suitable for advanced statistical analysis

3. Mode

The mode is the most frequently occurring value in a dataset.

Calculation:

Identify the data value that appears the most number of times.

Example:

Dataset: {12, 14, 14, 15, 18, 18, 18, 20}
Mode = 18 (appears 3 times)

Advantages:

Useful for categorical data
Easy to understand

Disadvantages:

May not exist if no value repeats
Can have multiple modes (bimodal, multimodal)

Quartiles, Measures of Dispersion

1. Quartiles (Q1, Q2, Q3)

Quartiles divide data into four equal parts.

First Quartile (Q1): 25% of data lies below this value.
Second Quartile (Q2 or Median): 50% of data lies below this value.
Third Quartile (Q3): 75% of data lies below this value.

Calculation Example:

Dataset: {12, 15, 18, 20, 25, 30, 35, 40}
Step 1: Arrange the data: Already sorted.
Step 2: Q1: (n+1)/4 = (8+1)/4 = 2.25 → Interpolate between 2nd (15) and 3rd (18) → Q1 ≈ 16.5
Step 3: Q2 (Median): Middle of the dataset (25 + 30)/2 = 27.5
Step 4: Q3: 3(n+1)/4 = 6.75 → Interpolate between 6th (30) and 7th (35) → Q3 ≈ 32.5

2. Range

Measures the spread of data from the smallest to the largest value.

Formula:

R a n g e = M a x (X) - M i n (X)

Example:

Dataset: {12, 18, 25, 35, 40}
Range = 40 - 12 = 28

Advantages:

Simple and quick

Disadvantages:

Ignores data distribution and is sensitive to outliers

3. Interquartile Range (IQR)

The range of the middle 50% of the data.

Formula:

I Q R = Q 3 - Q 1

Example:

Q3 = 32.5, Q1 = 16.5
IQR = 32.5 - 16.5 = 16

Advantage:

Not influenced by extreme values

Disadvantage:

Requires sorting and calculation of quartiles

4. Mean Deviation (Absolute Mean Deviation)

Measures the average of the absolute deviations from the mean.

Formula:

M e a n D e v i a t i o n = \frac{\sum ∣ X_{i} - M e a n ∣}{N​}

Example:

Dataset: {10, 20, 30}
Mean = (10 + 20 + 30) / 3 = 20
Mean Deviation = (|10 - 20| + |20 - 20| + |30 - 20|)/3 = 10

Advantage:

Provides a simple spread of data

Disadvantage:

Less useful for advanced statistical models

5. Standard Deviation (SD)

Measures how much data deviates from the mean.

Formula:

S D = \sqrt{\frac{\sum (X_{i} - M e a n)^{2}}{N}}

Example:

Dataset: {10, 20, 30}
Mean = 20
SD = $\sqrt{\frac{(10 - 20)^2 + (20 - 20)^2 + (30 - 20)^2}{3}} \approx 8.16$

Advantage:

Widely used in data analysis

Disadvantage:

Sensitive to outliers

6. Variance

The square of the standard deviation.

Formula:

V a r i a n c e = \frac{\sum (X_{i} - M e a n)^{2}}{N​}

Example:

Variance = $66.67$ for the dataset {10, 20, 30}

Advantage:

Useful for understanding variability

Disadvantage:

Hard to interpret directly due to squared units

7. Coefficient of Variation (CV)

Indicates relative variability compared to the mean.

Formula:

C V = (\frac{S D}{M e a n}) \times 100

Example:

SD = 8.16, Mean = 20
CV ≈ 40.8%

Advantage:

Useful for comparing variability across datasets

Disadvantage:

Not suitable for data with negative values

8. Skewness

Measures the asymmetry of the distribution.

Types:

Positive Skew: Right tail is longer
Negative Skew: Left tail is longer
Zero Skew: Symmetrical data

Formula:

S k e w n e s s = \frac{3 (M e a n - M e d i a n)}{S D​}

Example:

Dataset with Mean > Median is positively skewed.

9. Kurtosis

Measures the "peakedness" or "flatness" of the distribution.

Types:

Leptokurtic (Kurtosis > 3): Sharp peak
Mesokurtic (Kurtosis = 3): Normal distribution
Platykurtic (Kurtosis < 3): Flat peak

Formula:

K u r t o s i s = \frac{\sum (X_{i} - M e a n)^{4}}{N \times S D^{4}} - 3

Example:

If the dataset has many extreme values, it will be leptokurtic.

Unit I : Descriptive Statistics

Meaning of Statistics

Scope of Statistics

1. Descriptive Statistics

Key Techniques:

Applications:

2. Inferential Statistics

Key Techniques:

Applications:

Functions of Statistics

Limitations of Statistics

Measures of Central tendency – Mean, Median, Mode

1. Mean (Arithmetic Mean)

Formula:

Types of Mean:

Example:

Advantages:

Disadvantages:

2. Median

Calculation:

Formula for Even Data:

Example:

Advantages:

Disadvantages:

3. Mode

Calculation:

Example:

Advantages:

Disadvantages:

Quartiles, Measures of Dispersion

1. Quartiles (Q1, Q2, Q3)

Calculation Example:

2. Range

Formula:

Example:

Advantages:

Disadvantages:

3. Interquartile Range (IQR)

Formula:

Example:

Advantage:

Disadvantage:

4. Mean Deviation (Absolute Mean Deviation)

Formula:

Example:

Advantage:

Disadvantage:

5. Standard Deviation (SD)

Formula:

Example:

Advantage:

Disadvantage:

6. Variance

Formula:

Example:

Advantage:

Disadvantage:

7. Coefficient of Variation (CV)

Formula:

Example:

Advantage:

Disadvantage:

8. Skewness

Types:

Formula:

Example:

9. Kurtosis

Types:

Formula:

Example:

You might like