Unit IV: Probability Theory & Distribution
Probability
Probability is the measure of the likelihood of an event occurring. It ranges from 0 to 1, where:
- 0 means the event will not happen.
- 1 means the event will certainly happen.
Formula for Probability
where:
- = Probability of event .
- Favorable Outcomes = Outcomes that satisfy the event condition.
- Total Outcomes = All possible outcomes in the sample space.
4. Axioms of Probability (Kolmogorov’s Axioms)
- Non-negativity: for any event .
- Probability of Sample Space: .
- Addition Rule (for mutually exclusive events):
Addition and Multiplication Law
The Addition Law and Multiplication Law are fundamental rules in probability theory that help calculate the likelihood of combined events.1. Addition Law of Probability
The Addition Law is used when we calculate the probability of the occurrence of at least one of two or more events.
Formula for Two Events
Cases of Addition Law
2. Multiplication Law of Probability
The Multiplication Law is used when we calculate the probability of two or more events occurring together.
Formula for Two Events
where is the conditional probability of B occurring given that A has already occurred.
Bayes’ Theorem
Bayes’ Theorem is a fundamental concept in probability theory that helps update the probability of an event based on new evidence. It is widely used in decision-making, machine learning, and risk assessment.
Formula for Bayes’ Theorem
where:
- = Probability of event given that event has occurred (posterior probability).
- = Probability of event occurring given (likelihood).
- = Prior probability of occurring (prior probability).
- = Total probability of event occurring (marginal probability).
Example Problem
Problem: A factory produces 60% of its products from Machine A and 40% from Machine B. Machine A has a 5% defect rate, while Machine B has a 10% defect rate. If a product is randomly selected and found to be defective, what is the probability it was made by Machine A?
Solution using Bayes’ Theorem:
where:
- (prior probability of Machine A)
- (prior probability of Machine B)
- (defective rate of Machine A)
- (defective rate of Machine B)
- Now,
- So, the probability that a defective product came from Machine A is 42.86%.
5. Key Insights
✅ Bayes’ Theorem helps update probabilities with new evidence.
✅ Widely used in business, healthcare, and artificial intelligence.
✅ Crucial for decision-making under uncertainty.
Probability Theoretical Distributions
Probability distributions describe how random variables behave and help in statistical analysis and decision-making. They are classified into discrete and continuous distributions.
4. Real-World Applications
✅ Business Analytics: Demand forecasting using normal distribution.
✅ Finance: Stock market returns modeled with normal distribution.
✅ Healthcare: Disease occurrence modeled with Poisson distribution.
✅ Operations Management: Customer arrival times analyzed using exponential distribution.
Binomial Distribution
1. Concept of Binomial Distribution
The Binomial Distribution is a discrete probability distribution that models the number of successes in a fixed number of independent trials, where each trial has only two possible outcomes:
✅ Success (p)
❌ Failure (1 - p)
It is used when an experiment follows these conditions:
- Fixed Number of Trials (n): The experiment is repeated a set number of times.
- Binary Outcomes: Each trial results in success or failure.
- Independent Trials: The outcome of one trial does not affect another.
- Constant Probability (p): The probability of success remains the same for each trial.
2. Binomial Distribution Formula
where:
- = Probability of exactly successes.
- = Combination formula = .
- = Number of trials.
- = Number of successes.
- = Probability of success in a single trial.
- = Probability of failure.
4. Example Problem
Problem: A factory produces light bulbs, and each bulb has a 5% defect rate. If we randomly select 10 bulbs, what is the probability that exactly 2 bulbs are defective?
Solution using Binomial Formula:
So, the probability of exactly 2 defective bulbs in a sample of 10 is 7.46%.
5. Key Insights
Poisson and Normal Distributions
1. Poisson Distribution
The Poisson Distribution models the probability of a given number of events occurring in a fixed interval (time, space, or area), assuming:
✅ The events occur randomly and independently.
✅ The average rate of occurrence () is constant.
✅ Two events cannot occur at the exact same instant.
Formula:
where:- = Probability of exactly events occurring.
- = Average number of events per unit.
- = Number of occurrences.
- = Euler’s number (~2.718).
Example Problem:
A customer service center receives 10 calls per hour. What is the probability that exactly 7 calls will be received in an hour?
Solution using Poisson Formula:
P(X = 7) = 0.0902 (or 9.02%)}
So, the probability of receiving exactly 7 calls in an hour is 9.02%.
2. Normal Distribution
The Normal Distribution is a continuous probability distribution that follows a bell-shaped curve. It is used when data is symmetrically distributed around the mean.
Key Properties:
✅ Symmetric around the mean ().
✅ The total area under the curve = 1 (100%).
✅ Follows the 68-95-99.7 rule:
- 68% of data falls within 1 standard deviation ().
- 95% within 2 standard deviations.
- 99.7% within 3 standard deviations.
Formula:
where:
- = Mean (average).
- = Standard deviation.
- = Value in the dataset.
Example Problem:
IQ scores in a population are normally distributed with a mean () of 100 and a standard deviation () of 15. What percentage of people have an IQ between 85 and 115?
Using the 68-95-99.7 rule,
- 85 to 115 falls within 1 standard deviation ().
- 68% of values lie within this range.
So, 68% of people have an IQ between 85 and 115.
Summary & Key Insights
✅ Poisson Distribution is used for counting rare events over time or space.
✅ Normal Distribution is used for continuous data that follows a natural variation.
✅ Both are widely used in business, healthcare, and analytics.
Introduction to Bivariate and Multivariate Data Analysis
Data analysis can be classified based on the number of variables involved:
- Univariate Analysis – Analyzes one variable at a time (e.g., average salary of employees).
- Bivariate Analysis – Examines the relationship between two variables (e.g., height vs. weight).
- Multivariate Analysis – Examines the relationship among three or more variables (e.g., customer satisfaction based on price, quality, and service).
1. Bivariate Data Analysis
Bivariate analysis focuses on understanding how two variables are related.
Common Bivariate Techniques:
2. Multivariate Data Analysis
Multivariate analysis deals with three or more variables to identify patterns and relationships.
(A) Cluster Analysis
Concept:
- A method to group data into clusters based on similar characteristics.
- Objects in the same cluster are similar, while objects in different clusters are dissimilar.
- Used in market segmentation, customer profiling, and image recognition.
Techniques in Cluster Analysis:
(B) Factor Analysis
Concept:
- Reduces many variables into a few underlying factors.
- Helps in simplifying data and identifying hidden patterns.
- Used in psychology, marketing, and finance.
Types of Factor Analysis:
Summary & Conclusion
✅ Bivariate analysis helps understand relationships between two variables.
✅ Multivariate analysis handles complex data relationships involving multiple variables.
✅ Cluster Analysis is used for segmentation, while Factor Analysis is used for dimension reduction.