Unit IV: Probability Theory & Distribution

Probability

Probability is the measure of the likelihood of an event occurring. It ranges from 0 to 1, where:

0 means the event will not happen.
1 means the event will certainly happen.

Formula for Probability

P (E) = \frac{Favorable Outcomes}{Total Outcomes​}

where:

$P(E)$ = Probability of event $E$ .
Favorable Outcomes = Outcomes that satisfy the event condition.
Total Outcomes = All possible outcomes in the sample space.

The Theory of Probability is a branch of mathematics that studies the likelihood of events occurring. It provides a framework for analyzing random phenomena and making predictions based on available data.

4. Axioms of Probability (Kolmogorov’s Axioms)

Non-negativity: $P(E) \geq 0$ for any event $E$
Probability of Sample Space: $P(S) = 1$
Addition Rule (for mutually exclusive events): $P (A \cup B) = P (A) + P (B)$

Addition and Multiplication Law

The Addition Law and Multiplication Law are fundamental rules in probability theory that help calculate the likelihood of combined events.

1. Addition Law of Probability

The Addition Law is used when we calculate the probability of the occurrence of at least one of two or more events.

Formula for Two Events

P (A \cup B) = P (A) + P (B) - P (A \cap B)

Cases of Addition Law

2. Multiplication Law of Probability

The Multiplication Law is used when we calculate the probability of two or more events occurring together.

Formula for Two Events

P (A \cap B) = P (A) \times P (B ∣ A)

where $P(B | A)$ is the conditional probability of B occurring given that A has already occurred.

Bayes’ Theorem

Bayes’ Theorem is a fundamental concept in probability theory that helps update the probability of an event based on new evidence. It is widely used in decision-making, machine learning, and risk assessment.

Formula for Bayes’ Theorem

P (A ∣ B) = \frac{P (B ∣ A) \times P (A)}{P (B)​}

where:

$P(A | B)$ = Probability of event $A$ given that event $B$ has occurred (posterior probability).
$P(B | A)$ = Probability of event $B$ occurring given $A$ (likelihood).
$P(A)$ = Prior probability of $A$ occurring (prior probability).
$P(B)$ = Total probability of event $B$ occurring (marginal probability).

Example Problem

Problem: A factory produces 60% of its products from Machine A and 40% from Machine B. Machine A has a 5% defect rate, while Machine B has a 10% defect rate. If a product is randomly selected and found to be defective, what is the probability it was made by Machine A?

Solution using Bayes’ Theorem:

P (A ∣ D) = \frac{P (D ∣ A) \times P (A)}{P (D)​}

where:

$P(A) = 0.6$ (prior probability of Machine A)
$P(B) = 0.4$ (prior probability of Machine B)
$P(D | A) = 0.05$ (defective rate of Machine A)
$P(D | B) = 0.10$ (defective rate of Machine B)
$P(D) = P(D | A) P(A) + P(D | B) P(B)$ $P(D) = (0.05 \times 0.6) + (0.10 \times 0.4) = 0.03 + 0.04 = 0.07$
Now, $P(A | D) = \frac{(0.05 \times 0.6)}{0.07} = \frac{0.03}{0.07} = 0.4286$
So, the probability that a defective product came from Machine A is 42.86%.

5. Key Insights

✅ Bayes’ Theorem helps update probabilities with new evidence.
✅ Widely used in business, healthcare, and artificial intelligence.
✅ Crucial for decision-making under uncertainty.

Probability Theoretical Distributions

Probability distributions describe how random variables behave and help in statistical analysis and decision-making. They are classified into discrete and continuous distributions.

4. Real-World Applications

✅ Business Analytics: Demand forecasting using normal distribution.
✅ Finance: Stock market returns modeled with normal distribution.
✅ Healthcare: Disease occurrence modeled with Poisson distribution.
✅ Operations Management: Customer arrival times analyzed using exponential distribution.

Binomial Distribution

1. Concept of Binomial Distribution

The Binomial Distribution is a discrete probability distribution that models the number of successes in a fixed number of independent trials, where each trial has only two possible outcomes:
✅ Success (p)
❌ Failure (1 - p)

It is used when an experiment follows these conditions:

Fixed Number of Trials (n): The experiment is repeated a set number of times.
Binary Outcomes: Each trial results in success or failure.
Independent Trials: The outcome of one trial does not affect another.
Constant Probability (p): The probability of success remains the same for each trial.

2. Binomial Distribution Formula

$P (X = k) = (\binom{n}{k}) p^{k} (1 - p)^{n - k}$

where:

$P(X = k)$ = Probability of exactly $k$ successes.
$\binom{n}{k}$ = Combination formula = $\frac{n!}{k!(n-k)!}$ .
$n$ = Number of trials.
$k$ = Number of successes.
$p$ = Probability of success in a single trial.
$(1 - p)$ = Probability of failure.

4. Example Problem

Problem: A factory produces light bulbs, and each bulb has a 5% defect rate. If we randomly select 10 bulbs, what is the probability that exactly 2 bulbs are defective?

Solution using Binomial Formula:

n=10 (total bulbs)

$k = 2$
$p = 0.05$
$(1 - p) = 0.95$

P(X = 2) = \binom{10}{2} (0.05)^2 (0.95)^8

P(X = 2) = \frac{10!}{2!(8!)} \times (0.05)^2 \times (0.95)^8

P(X = 2) = 45 \times 0.0025 \times 0.6634 = 0.0746

So, the probability of exactly 2 defective bulbs in a sample of 10 is 7.46%.

5. Key Insights

✅ Binomial distribution helps in decision-making under uncertainty.

✅ Useful in business, finance, healthcare, and sports.

✅ Assumes independent trials with fixed probabilities.

Poisson and Normal Distributions

1. Poisson Distribution

The Poisson Distribution models the probability of a given number of events occurring in a fixed interval (time, space, or area), assuming:
✅ The events occur randomly and independently.
✅ The average rate of occurrence ( $\lambda$ ) is constant.
✅ Two events cannot occur at the exact same instant.

Formula:

P(X = k) = \frac{e^{-\lambda} \lambda^k}{k!}

where:

$P(X = k)$ = Probability of exactly $k$ events occurring.
$\lambda$ = Average number of events per unit.
$k$ = Number of occurrences.
$e$ = Euler’s number (~2.718).

Example Problem:

A customer service center receives 10 calls per hour. What is the probability that exactly 7 calls will be received in an hour?

Solution using Poisson Formula:

$P(X = 7) = \frac{e^{-10} 10^7}{7!}$ $P(X = 7) = \frac{0.0000454 \times 10,000,000}{5040}$ $P(X = 7) = 0.0902 (or 9.02%)}$

So, the probability of receiving exactly 7 calls in an hour is 9.02%.

2. Normal Distribution

The Normal Distribution is a continuous probability distribution that follows a bell-shaped curve. It is used when data is symmetrically distributed around the mean.

Key Properties:

✅ Symmetric around the mean ( $\mu$ ).
✅ The total area under the curve = 1 (100%).
✅ Follows the 68-95-99.7 rule:

68% of data falls within 1 standard deviation ( $\sigma$ ).
95% within 2 standard deviations.
99.7% within 3 standard deviations.

Formula:

$f (x) = \frac{1}{σ \sqrt{2 π}} e^{- \frac{(x - μ)^{2}}{2 σ^{2}}}$

where:

$\mu$ = Mean (average).
$\sigma$ = Standard deviation.
$x$ = Value in the dataset.

Example Problem:

IQ scores in a population are normally distributed with a mean ( $\mu$ ) of 100 and a standard deviation ( $\sigma$ ) of 15. What percentage of people have an IQ between 85 and 115?

Using the 68-95-99.7 rule,

85 to 115 falls within 1 standard deviation ( $\pm 1\sigma$ ).
68% of values lie within this range.

So, 68% of people have an IQ between 85 and 115. Probability Theory & Distribution

Summary & Key Insights

✅ Poisson Distribution is used for counting rare events over time or space.
✅ Normal Distribution is used for continuous data that follows a natural variation.
✅ Both are widely used in business, healthcare, and analytics.

Introduction to Bivariate and Multivariate Data Analysis

Data analysis can be classified based on the number of variables involved:

Univariate Analysis – Analyzes one variable at a time (e.g., average salary of employees).
Bivariate Analysis – Examines the relationship between two variables (e.g., height vs. weight).
Multivariate Analysis – Examines the relationship among three or more variables (e.g., customer satisfaction based on price, quality, and service).

1. Bivariate Data Analysis

Bivariate analysis focuses on understanding how two variables are related.

Common Bivariate Techniques:

2. Multivariate Data Analysis

Multivariate analysis deals with three or more variables to identify patterns and relationships.

(A) Cluster Analysis

Concept:

A method to group data into clusters based on similar characteristics.
Objects in the same cluster are similar, while objects in different clusters are dissimilar.
Used in market segmentation, customer profiling, and image recognition.

Techniques in Cluster Analysis:

(B) Factor Analysis

Concept:

Reduces many variables into a few underlying factors.
Helps in simplifying data and identifying hidden patterns.
Used in psychology, marketing, and finance.

Types of Factor Analysis:

Summary & Conclusion

✅ Bivariate analysis helps understand relationships between two variables.
✅ Multivariate analysis handles complex data relationships involving multiple variables.
✅ Cluster Analysis is used for segmentation, while Factor Analysis is used for dimension reduction.

Unit IV: Probability Theory & Distribution

Probability

Formula for Probability

4. Axioms of Probability (Kolmogorov’s Axioms)

Addition and Multiplication Law

1. Addition Law of Probability

Formula for Two Events

Cases of Addition Law

2. Multiplication Law of Probability

Formula for Two Events

Bayes’ Theorem

Formula for Bayes’ Theorem

Example Problem

5. Key Insights

Probability Theoretical Distributions

4. Real-World Applications

Binomial Distribution

1. Concept of Binomial Distribution

2. Binomial Distribution Formula

4. Example Problem

5. Key Insights

Poisson and Normal Distributions

1. Poisson Distribution

Formula:

Example Problem:

2. Normal Distribution

Key Properties:

Formula:

Example Problem:

Summary & Key Insights

Introduction to Bivariate and Multivariate Data Analysis

1. Bivariate Data Analysis

Common Bivariate Techniques:

2. Multivariate Data Analysis

(A) Cluster Analysis

(B) Factor Analysis

Summary & Conclusion

You might like