Unit III: Correlation & Regression Analysis
Correlation Analysis
Correlation analysis is a statistical technique used to measure and analyze the strength and direction of the relationship between two or more variables. It helps in understanding how changes in one variable are associated with changes in another.
Types of Correlation
Positive Correlation – When both variables move in the same direction. Example: Higher education levels lead to higher salaries.
Negative Correlation – When one variable increases, the other decreases. Example: Increased exercise leads to lower body weight.
No Correlation – When there is no relationship between the variables. Example: The number of books read and daily temperature.
Correlation Coefficient (r)
The correlation coefficient (denoted as "r") quantifies the strength and direction of a relationship between two variables:
- +1 → Perfect positive correlation
- 0 → No correlation
- -1 → Perfect negative correlation
Formula for Pearson Correlation
1. Rank Method (Spearman’s Rank Correlation Coefficient)
The Spearman Rank Correlation Coefficient (ρ or rₛ) measures the strength and direction of a monotonic relationship between two ranked variables. It is used when data is ordinal or when the relationship is not necessarily linear.
The formula for Spearman’s Rank Correlation
where:
- = Difference between the ranks of corresponding values
- = Number of observations
Steps to Calculate Spearman’s Rank Correlation
- Rank the values of both variables separately.
- Compute the differences () between the ranks.
- Square the differences () and sum them.
- Apply the formula.
2. Karl Pearson’s Coefficient of Correlation (r)
Karl Pearson’s coefficient of correlation measures the strength and direction of a linear relationship between two continuous variables.
The formula for Pearson’s Correlation
Properties of Karl Pearson’s Correlation
- Range:
- Direction:
- → Positive correlation
- → Negative correlation
- → No correlation
- Linear Relationship: Measures only linear relationships.
- Unit-Free Measure: Correlation does not depend on measurement units.
- Symmetric: Correlation between and is the same as between and .
Regression Analysis
Regression analysis is a statistical method used to study the relationship between a dependent variable (outcome) and one or more independent variables (predictors). It helps in predicting values, identifying trends, and understanding causal relationships.
3. Simple Linear Regression
Equation of a Straight Line
where:
- = Dependent variable (output)
- = Independent variable (input)
- = Intercept (value of Y when X = 0)
- = Slope (rate of change in Y for a unit change in X)
Formula for Slope (b) and Intercept (a)
Example Calculation
Consider a dataset where a company examines advertising spend () and corresponding sales ():
Final regression equation:
This means for every 1 unit increase in advertising spend, sales increase by 1.52 units.
4. Multiple Linear Regression
When there are two or more independent variables, the equation is:
where:
- are independent variables
- are coefficients
Example:
If advertising = 10 and discount = 2, then
5. Importance of Regression Analysis
- Prediction & Forecasting (e.g., predicting revenue based on marketing spend)
- Understanding Relationships (e.g., how customer satisfaction affects sales)
- Decision-Making (e.g., determining optimal pricing strategies)
- Identifying Key Influencers (e.g., which factors impact employee productivity most)
1. Fitting a Regression Line
Fitting a regression line involves finding the equation that best describes the relationship between an independent variable () and a dependent variable (). The equation of a simple linear regression line is:
where:
- = Dependent variable (predicted value)
- = Independent variable (input)
- = Intercept (value of when )
- = Slope (rate of change in per unit change in )
Steps to Fit a Regression Line
Step 1: Collect Data
Example dataset (Advertising Spend vs. Sales):
Step 2: Calculate Required Values
- (number of observations)
Step 3: Compute Slope () and Intercept ()
Step 4: Regression Equation
This means for every 1 unit increase in advertising spend, sales increase by 1.52 units.
Example Prediction
If a company spends $6 on advertising, expected sales are:
So, sales will be 9.42 units when advertising spend is $6.
3. Business Applications of Regression Analysis
✅ Marketing – Predicting sales based on advertising spend.
✅ HR Management – Estimating employee performance based on training hours.
✅ Finance – Forecasting stock prices using past trends.
✅ Operations – Predicting production output based on machine hours.
Properties of Regression Coefficients
Regression coefficients (
1. Properties of Regression Coefficients
Relationship between Regression and Correlation
1. Differences between Regression and Correlation
2. Relationship Between Regression Coefficient and Correlation Coefficient
- The correlation coefficient (
) is related to regression coefficients asr r -
r = b y x ⋅ b x y r = \sqrt{b_{yx} \cdot b_{xy}} - The sign of the regression coefficient determines the sign of the correlation coefficient.
- If both regression coefficients are positive, then
is positive, and vice versa.r r - When
, the regression coefficients are also zero, indicating no relationship.r = 0 r = 0
Key Insights
✅ Regression helps in prediction, while correlation measures relationship strength.
✅ Regression coefficients depend on units, but correlation is unit-free.
✅ A high correlation does not imply causation, whereas regression is used for causal analysis.