I previously introduced the Barra model in this article, but I put off how to calculate beta from actual data.
>-
This time, I would like to try calculating it with Python using actual data.
Calculation Formula
For simplicity, we will calculate a single-factor model this time.
Let’s calculate the factor beta received from TOPIX (considered as a factor affecting the entire Japanese stock market) from the return data of an index ETF and TOPIX price movements.
At this time, the calculation formula becomes as follows, and we perform regression analysis using this formula and return data.
Actually Writing in Python
I’m calculating from log returns.
import yfinance as yfimport pandas as pdimport numpy as npfrom sklearn.linear_model import LinearRegression
# 1. Get dataetf_ticker = '1626.T' # TOPIX-17 IT & Services, Others ETFtopix_ticker = '^TOPX' # TOPIX Index
# Get data specifying the periodstart_date = '2011-01-01'end_date = '2024-8-16'
etf_data = yf.download(etf_ticker, start=start_date, end=end_date)['Adj Close']topix_data = yf.download(topix_ticker, start=start_date, end=end_date)['Adj Close']
# 2. Calculate daily log returnsetf_return = np.log(etf_data/etf_data.shift(1))topix_return = np.log(topix_data/topix_data.shift(1))
# 3. Shape data and perform regression analysisdata = pd.DataFrame({ 'ETF_Return': etf_return, 'TOPIX_Return': topix_return,}).dropna()
# 4. Prepare for regression analysisX = data[['TOPIX_Return']] # Explanatory variable (TOPIX return)y = data['ETF_Return'].values # Objective variable (ETF return)
# 5. Perform regression analysismodel = LinearRegression()model.fit(X, y)
# 6. Display resultsbeta_etf_topix = model.coef_[0] # Beta to TOPIXalpha_etf = model.intercept_ # Alpha value
print(f"β (Beta of ETF to TOPIX): {beta_etf_topix}")print(f"α (Stock-specific return): {alpha_etf}")Execution Result
We found that the beta of the ETF to TOPIX is 0.659, and the daily log return is 0.026%.
β (Beta of ETF to TOPIX): 0.6591968184312232α (Stock-specific return): 0.00026399258025840264Conclusion
Basically, obtaining data is the hardest part, and once you have data, you can calculate it easily with the help of libraries.
As an introduction, it might be interesting to think of factors that are easy to obtain data for (USD/JPY rate, US interest rates, other indices, etc.) and try it out once.









