Unit III: Correlation & Regression Analysis


Correlation Analysis

Correlation analysis is a statistical technique used to measure and analyze the strength and direction of the relationship between two or more variables. It helps in understanding how changes in one variable are associated with changes in another.

Types of Correlation

  1. Positive Correlation – When both variables move in the same direction. Example: Higher education levels lead to higher salaries.

  2. Negative Correlation – When one variable increases, the other decreases. Example: Increased exercise leads to lower body weight.

  3. No Correlation – When there is no relationship between the variables. Example: The number of books read and daily temperature.

Correlation & Regression Analysis

Correlation Coefficient (r)

The correlation coefficient (denoted as "r") quantifies the strength and direction of a relationship between two variables:

  • +1 → Perfect positive correlation
  • 0 → No correlation
  • -1 → Perfect negative correlation

Formula for Pearson Correlation

r=nXY(X)(Y)[nX2(X)2][nY2(Y)2]r = \frac{n\sum XY - (\sum X)(\sum Y)}{\sqrt{[n\sum X^2 - (\sum X)^2][n\sum Y^2 - (\sum Y)^2]}}

1. Rank Method (Spearman’s Rank Correlation Coefficient)

The Spearman Rank Correlation Coefficient (ρ or rₛ) measures the strength and direction of a monotonic relationship between two ranked variables. It is used when data is ordinal or when the relationship is not necessarily linear.

The formula for Spearman’s Rank Correlation

rs=16d2n(n21)​

where:

  • dd = Difference between the ranks of corresponding values
  • nn = Number of observations

Steps to Calculate Spearman’s Rank Correlation

  1. Rank the values of both variables separately.
  2. Compute the differences (dd) between the ranks.
  3. Square the differences (d2d^2) and sum them.
  4. Apply the formula.
Correlation & Regression Analysis
This indicates a strong positive correlation.

2. Karl Pearson’s Coefficient of Correlation (r)

Karl Pearson’s coefficient of correlation measures the strength and direction of a linear relationship between two continuous variables.

The formula for Pearson’s Correlation

r=nXY(X)(Y)[nX2(X)2][nY2(Y)2]​

Properties of Karl Pearson’s Correlation

  1. Range: 1r1-1 \leq r \leq 1
  2. Direction:
    • r>0r > 0 → Positive correlation
    • r<0r < 0 → Negative correlation
    • r=0r = 0 → No correlation
  3. Linear Relationship: Measures only linear relationships.
  4. Unit-Free Measure: Correlation does not depend on measurement units.
  5. Symmetric: Correlation between XX and YY is the same as between YY and XX.
Correlation & Regression Analysis

Regression Analysis

Regression analysis is a statistical method used to study the relationship between a dependent variable (outcome) and one or more independent variables (predictors). It helps in predicting values, identifying trends, and understanding causal relationships.

Correlation & Regression Analysis

3. Simple Linear Regression

Equation of a Straight Line

Y=a+bX

where:

  • YY = Dependent variable (output)
  • XX = Independent variable (input)
  • aa = Intercept (value of Y when X = 0)
  • bb = Slope (rate of change in Y for a unit change in X)

Formula for Slope (b) and Intercept (a)

b=nXYXYnX2(X)2b = \frac{n\sum XY - \sum X \sum Y}{n\sum X^2 - (\sum X)^2} a=YbXna = \frac{\sum Y - b\sum X}{n}

Example Calculation

Consider a dataset where a company examines advertising spend (XX) and corresponding sales (YY):

Correlation & Regression Analysis

Final regression equation:

Y=0.3+1.52X

This means for every 1 unit increase in advertising spend, sales increase by 1.52 units.

4. Multiple Linear Regression

When there are two or more independent variables, the equation is:

Y=a+b1X1+b2X2+...+bnXn​

where:

  • X1,X2,...XnX_1, X_2, ... X_n are independent variables
  • b1,b2,...bnb_1, b_2, ... b_n are coefficients

Example:

Sales=5+1.2(Advertising)+0.8(Discount)

If advertising = 10 and discount = 2, then

Sales=5+(1.2)(10)+(0.8)(2)=18.6

5. Importance of Regression Analysis

  • Prediction & Forecasting (e.g., predicting revenue based on marketing spend)
  • Understanding Relationships (e.g., how customer satisfaction affects sales)
  • Decision-Making (e.g., determining optimal pricing strategies)
  • Identifying Key Influencers (e.g., which factors impact employee productivity most)

1. Fitting a Regression Line

Fitting a regression line involves finding the equation that best describes the relationship between an independent variable (XX) and a dependent variable (YY). The equation of a simple linear regression line is:

Y=a+bX

where:

  • YY = Dependent variable (predicted value)
  • XX = Independent variable (input)
  • aa = Intercept (value of YY when X=0X = 0)
  • bb = Slope (rate of change in YY per unit change in XX)

Steps to Fit a Regression Line

Step 1: Collect Data

Example dataset (Advertising Spend vs. Sales):

Correlation & Regression Analysis

Step 2: Calculate Required Values

  • n=5n = 5 (number of observations)
  • X=2+3+5+7+9=26
  • Y=4+5+7+10+15=41\sum Y = 4 + 5 + 7 + 10 + 15 = 41
  • X2=4+9+25+49+81=168\sum X^2 = 4 + 9 + 25 + 49 + 81 = 168
  • XY=8+15+35+70+135=263\sum XY = 8 + 15 + 35 + 70 + 135 = 263

Step 3: Compute Slope (bb) and Intercept (aa)

b=nXYXYnX2(X)2​

b=(5)(263)(26)(41)(5)(168)(26)2=13151066840676=249164=1.52
b = \frac{(5)(263) - (26)(41)}{(5)(168) - (26)^2} = \frac{1315 - 1066}{840 - 676} = \frac{249}{164} = 1.52

a=YbXn=41(1.52)(26)5=4139.525=1.485=0.3



Step 4: Regression Equation

Y=0.3+1.52X

This means for every 1 unit increase in advertising spend, sales increase by 1.52 units.

Correlation & Regression Analysis

Example Prediction

If a company spends $6 on advertising, expected sales are:

Y=0.3+1.52(6)=9.42

So, sales will be 9.42 units when advertising spend is $6.

3. Business Applications of Regression Analysis

Marketing – Predicting sales based on advertising spend.
HR Management – Estimating employee performance based on training hours.
Finance – Forecasting stock prices using past trends.
Operations – Predicting production output based on machine hours.

Properties of Regression Coefficients

Regression coefficients (bb) represent the relationship between the dependent variable (YY) and independent variable (XX) in a regression equation.

1. Properties of Regression Coefficients

Correlation & Regression Analysis

Relationship between Regression and Correlation

1. Differences between Regression and Correlation

Correlation & Regression Analysis

2. Relationship Between Regression Coefficient and Correlation Coefficient

  • The correlation coefficient (rr) is related to regression coefficients as
  •   r=byxbxyr = \sqrt{b_{yx} \cdot b_{xy}}
  • The sign of the regression coefficient determines the sign of the correlation coefficient.
  • If both regression coefficients are positive, then rr is positive, and vice versa.
  • When r=0r = 0, the regression coefficients are also zero, indicating no relationship.

Key Insights

Regression helps in prediction, while correlation measures relationship strength.
Regression coefficients depend on units, but correlation is unit-free.
A high correlation does not imply causation, whereas regression is used for causal analysis.