Factor Analysis with Python: Calculating Beta from Real Data

Progress 8 / 12
Table of Contents

I previously introduced the Barra model in this article, but I put off how to calculate beta from actual data.

投資リスクを評価する「Barraモデル」とは?マルチファクターモデルの仕組みと計算方法を解説

>-

blog.otama-playground.com

This time, I would like to try calculating it with Python using actual data.

Calculation Formula

For simplicity, we will calculate a single-factor model this time.

Let’s calculate the factor beta received from TOPIX (considered as a factor affecting the entire Japanese stock market) from the return data of an index ETF and TOPIX price movements.

At this time, the calculation formula becomes as follows, and we perform regression analysis using this formula and return data.

Retf=αetf+βetf,topixFtopix+ϵetfR_{etf} = \alpha_{etf} + \beta_{etf,topix} F_{topix} + \epsilon_{etf} Retf:Return of ETFαetf:Stock-specific returnβetf,topix:Exposure to TOPIX (Beta)Ftopix:Return of TOPIXϵetf:Specific risk component (Random walk part, unexplained part)\begin{aligned} R_{etf} &: \text{Return of ETF} \\ \alpha_{etf} &: \text{Stock-specific return} \\ \beta_{etf,topix} &: \text{Exposure to TOPIX (Beta)} \\ F_{topix} &: \text{Return of TOPIX} \\ \epsilon_{etf} &: \text{Specific risk component (Random walk part, unexplained part)} \\ \end{aligned}

Actually Writing in Python

I’m calculating from log returns.

import yfinance as yf
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
# 1. Get data
etf_ticker = '1626.T' # TOPIX-17 IT & Services, Others ETF
topix_ticker = '^TOPX' # TOPIX Index
# Get data specifying the period
start_date = '2011-01-01'
end_date = '2024-8-16'
etf_data = yf.download(etf_ticker, start=start_date, end=end_date)['Adj Close']
topix_data = yf.download(topix_ticker, start=start_date, end=end_date)['Adj Close']
# 2. Calculate daily log returns
etf_return = np.log(etf_data/etf_data.shift(1))
topix_return = np.log(topix_data/topix_data.shift(1))
# 3. Shape data and perform regression analysis
data = pd.DataFrame({
'ETF_Return': etf_return,
'TOPIX_Return': topix_return,
}).dropna()
# 4. Prepare for regression analysis
X = data[['TOPIX_Return']] # Explanatory variable (TOPIX return)
y = data['ETF_Return'].values # Objective variable (ETF return)
# 5. Perform regression analysis
model = LinearRegression()
model.fit(X, y)
# 6. Display results
beta_etf_topix = model.coef_[0] # Beta to TOPIX
alpha_etf = model.intercept_ # Alpha value
print(f"β (Beta of ETF to TOPIX): {beta_etf_topix}")
print(f"α (Stock-specific return): {alpha_etf}")

Execution Result

We found that the beta of the ETF to TOPIX is 0.659, and the daily log return is 0.026%.

β (Beta of ETF to TOPIX): 0.6591968184312232
α (Stock-specific return): 0.00026399258025840264

Conclusion

Basically, obtaining data is the hardest part, and once you have data, you can calculate it easily with the help of libraries.

As an introduction, it might be interesting to think of factors that are easy to obtain data for (USD/JPY rate, US interest rates, other indices, etc.) and try it out once.