Stock market portfolio optimization involves analyzing price trends, expected returns, and stock correlations for diversification. Techniques like Modern Portfolio Theory (MPT) help create an efficient portfolio, highlighting the optimal trade-off between risk and return on the efficient frontier.
The goal of stock market portfolio optimization is to identify the portfolio with the highest Sharpe ratio, indicating the best risk-adjusted return. This approach provides a clear allocation strategy for selected stocks, helping to achieve long-term investment objectives.
To get started with stock market portfolio optimization, we need to collect data about the stock market performance over time. I will collect real-time stock market data using the yfinance API. If you need historical data, you can download it from here.
import yfinance as yf
import pandas as pd
from datetime import date, timedelta, datetime
pip install yfinance
Requirement already satisfied: yfinance in /usr/local/lib/python3.10/dist-packages (0.2.48)
Requirement already satisfied: pandas>=1.3.0 in /usr/local/lib/python3.10/dist-packages (from yfinance) (2.2.2)
Requirement already satisfied: numpy>=1.16.5 in /usr/local/lib/python3.10/dist-packages (from yfinance) (1.26.4)
Requirement already satisfied: requests>=2.31 in /usr/local/lib/python3.10/dist-packages (from yfinance) (2.32.3)
Requirement already satisfied: multitasking>=0.0.7 in /usr/local/lib/python3.10/dist-packages (from yfinance) (0.0.11)
Requirement already satisfied: lxml>=4.9.1 in /usr/local/lib/python3.10/dist-packages (from yfinance) (5.3.0)
Requirement already satisfied: platformdirs>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from yfinance) (4.3.6)
Requirement already satisfied: pytz>=2022.5 in /usr/local/lib/python3.10/dist-packages (from yfinance) (2024.2)
Requirement already satisfied: frozendict>=2.3.4 in /usr/local/lib/python3.10/dist-packages (from yfinance) (2.4.6)
Requirement already satisfied: peewee>=3.16.2 in /usr/local/lib/python3.10/dist-packages (from yfinance) (3.17.7)
Requirement already satisfied: beautifulsoup4>=4.11.1 in /usr/local/lib/python3.10/dist-packages (from yfinance) (4.12.3)
Requirement already satisfied: html5lib>=1.1 in /usr/local/lib/python3.10/dist-packages (from yfinance) (1.1)
Requirement already satisfied: soupsieve>1.2 in /usr/local/lib/python3.10/dist-packages (from beautifulsoup4>=4.11.1->yfinance) (2.6)
Requirement already satisfied: six>=1.9 in /usr/local/lib/python3.10/dist-packages (from html5lib>=1.1->yfinance) (1.16.0)
Requirement already satisfied: webencodings in /usr/local/lib/python3.10/dist-packages (from html5lib>=1.1->yfinance) (0.5.1)
Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.10/dist-packages (from pandas>=1.3.0->yfinance) (2.8.2)
Requirement already satisfied: tzdata>=2022.7 in /usr/local/lib/python3.10/dist-packages (from pandas>=1.3.0->yfinance) (2024.2)
Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests>=2.31->yfinance) (3.4.0)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests>=2.31->yfinance) (3.10)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests>=2.31->yfinance) (2.2.3)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests>=2.31->yfinance) (2024.8.30)
[ ]
data = pd.read_csv('/content/stock_data.csv')
# Melt the DataFrame to make it long format where each row is a unique combination of Date, Ticker, and attributes
data_melted = data.melt(id_vars=['Date', 'Ticker'], var_name='Attribute', value_name='Value')
# Pivot the melted DataFrame to have the attributes (Open, High, Low, etc.) as columns
data_pivoted = data_melted.pivot_table(index=['Date', 'Ticker'], columns='Attribute', values='Value', aggfunc='first')
# Reset index to turn multi-index into columns
stock_data = data_pivoted.reset_index()
# Display the first few rows of the final DataFrame
print(stock_data.head())
Start Coding or generate with AI
# Define the time period for the data
end_date = date.today().strftime("%Y-%m-%d")
start_date = (date.today() - timedelta(days=365)).strftime("%Y-%m-%d")
# List of stock tickers to download
tickers = ['RELIANCE.NS', 'TCS.NS', 'INFY.NS', 'HDFCBANK.NS']
# Download stock data
data = yf.download(tickers, start=start_date, end=end_date, progress=False)
# Reset index to bring Date into the columns for the melt function
data = data.reset_index()
# Melt the DataFrame to make it long format where each row is a unique combination of Date, Ticker, and attributes
data_melted = data.melt(id_vars=['Date', 'Ticker'], var_name='Attribute', value_name='Value')
# Pivot the melted DataFrame to have the attributes (Open, High, Low, etc.) as columns
data_pivoted = data_melted.pivot_table(index=['Date', 'Ticker'], columns='Attribute', values='Value', aggfunc='first')
# Reset index to turn multi-index into columns
stock_data = data_pivoted.reset_index()
# Display the first few rows of the final DataFrame
print(stock_data.head())
# Display the first few rows of the final DataFrame
print(stock_data.head())
# define the time period for the data
end_date = date.today().strftime("%Y-%m-%d")
start_date = (date.today() - timedelta(days=365)).strftime("%Y-%m-%d")
# list of stock tickers to download
tickers = ['RELIANCE.NS', 'TCS.NS', 'INFY.NS', 'HDFCBANK.NS']
data = yf.download(tickers, start=start_date, end=end_date, progress=False)
# reset index to bring Date into the columns for the melt function
data = data.reset_index()
# melt the DataFrame to make it long format where each row is a unique combination of Date, Ticker, and attributes
data_melted = data.melt(id_vars=['Date'], var_name=['Attribute', 'Ticker'])
# pivot the melted DataFrame to have the attributes (Open, High, Low, etc.) as columns
data_pivoted = data_melted.pivot_table(index=['Date', 'Ticker'], columns='Attribute', values='value', aggfunc='first')
# reset index to turn multi-index into columns
stock_data = data_pivoted.reset_index()
print(stock_data.head())
Now, let’s have a look at the stock market performance of these companies in the stock market over time:
import matplotlib.pyplot as plt
import seaborn as sns
stock_data['Date'] = pd.to_datetime(stock_data['Date'])
stock_data.set_index('Date', inplace=True)
stock_data.reset_index(inplace=True)
plt.figure(figsize=(14, 7))
sns.set(style='whitegrid')
sns.lineplot(data=stock_data, x='Date', y='Adj Close', hue='Ticker', marker='o')
plt.title('Adjusted Close Price Over Time', fontsize=16)
plt.xlabel('Date', fontsize=14)
plt.ylabel('Adjusted Close Price', fontsize=14)
plt.legend(title='Ticker', title_fontsize='13', fontsize='11')
plt.grid(True)
plt.xticks(rotation=45)
plt.show()
!pip install mplfinance
import pandas as pd
import mplfinance as mpf
# Load and prepare your data
file_path = '/content/stock_data.csv'
stock_data = pd.read_csv(file_path)
# Convert 'Date' column to datetime format
stock_data['Date'] = pd.to_datetime(stock_data['Date'])
# Set 'Date' as index for mplfinance
stock_data.set_index('Date', inplace=True)
# Filter data by ticker if needed
ticker = 'RELIANCE.NS' # Specify the ticker you want to visualize, e.g., 'RELIANCE.NS'
ticker_data = stock_data[stock_data['Ticker'] == ticker]
# Select only the necessary columns for the candlestick chart
ticker_data = ticker_data[['Open', 'High', 'Low', 'Close', 'Volume']]
# Plot the candlestick chart
mpf.plot(ticker_data, type='candle', volume=True, title=f'Candlestick Chart for {ticker}',
style='charles', figsize=(14, 7))
The graph displays the adjusted close prices of four stocks (HDFCBANK.NS, INFY.NS, RELIANCE.NS, TCS.NS) over time from July 2023 to July 2024. It highlights that TCS has the highest adjusted close prices, followed by RELIANCE, INFY (Infosys), and HDFCBANK. The prices for RELIANCE and TCS show noticeable upward trends, which indicates strong performance, while HDFCBANK and INFY exhibit more stability with relatively lower price fluctuations.
Now, let’s try to compute the 50-day and 200-day moving averages and plot these along with the Adjusted Close price for each stock:
short_window = 50
long_window = 200
stock_data.set_index('Date', inplace=True)
unique_tickers = stock_data['Ticker'].unique()
for ticker in unique_tickers:
ticker_data = stock_data[stock_data['Ticker'] == ticker].copy()
ticker_data['50_MA'] = ticker_data['Adj Close'].rolling(window=short_window).mean()
ticker_data['200_MA'] = ticker_data['Adj Close'].rolling(window=long_window).mean()
plt.figure(figsize=(14, 7))
plt.plot(ticker_data.index, ticker_data['Adj Close'], label='Adj Close')
plt.plot(ticker_data.index, ticker_data['50_MA'], label='50-Day MA')
plt.plot(ticker_data.index, ticker_data['200_MA'], label='200-Day MA')
plt.title(f'{ticker} - Adjusted Close and Moving Averages')
plt.xlabel('Date')
plt.ylabel('Price')
plt.legend()
plt.grid(True)
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
plt.figure(figsize=(14, 7))
plt.bar(ticker_data.index, ticker_data['Volume'], label='Volume', color='orange')
plt.title(f'{ticker} - Volume Traded')
plt.xlabel('Date')
plt.ylabel('Volume')
plt.legend()
plt.grid(True)
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
For HDFCBANK and INFY, the prices initially decline but later show signs of recovery, as indicated by the moving averages. RELIANCE and TCS display a more consistent upward trend in their adjusted close prices. The volume traded graphs highlight significant trading activity at various points, with spikes indicating high trading volumes, particularly noticeable in HDFCBANK and RELIANCE around early 2024. These insights are crucial for understanding price movements and trading behaviours, which assist in making informed investment decisions.
Now, let’s have a look at the distribution of daily returns of these stocks:
stock_data['Daily Return'] = stock_data.groupby('Ticker')['Adj Close'].pct_change()
plt.figure(figsize=(14, 7))
sns.set(style='whitegrid')
for ticker in unique_tickers:
ticker_data = stock_data[stock_data['Ticker'] == ticker]
sns.histplot(ticker_data['Daily Return'].dropna(), bins=50, kde=True, label=ticker, alpha=0.5)
plt.title('Distribution of Daily Returns', fontsize=16)
plt.xlabel('Daily Return', fontsize=14)
plt.ylabel('Frequency', fontsize=14)
plt.legend(title='Ticker', title_fontsize='13', fontsize='11')
plt.grid(True)
plt.tight_layout()
plt.show()
The distributions are approximately normal, centred around zero, which indicates that most daily returns are close to the average return. However, there are tails on both sides, which reflect occasional significant gains or losses. INFY and RELIANCE appear to have slightly wider distributions, which suggests higher volatility compared to HDFCBANK and TCS.
Now, let’s see if there’s any correlation between all these stocks:
daily_returns = stock_data.pivot_table(index='Date', columns='Ticker', values='Daily Return')
correlation_matrix = daily_returns.corr()
plt.figure(figsize=(12, 10))
sns.set(style='whitegrid')
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', linewidths=.5, fmt='.2f', annot_kws={"size": 10})
plt.title('Correlation Matrix of Daily Returns', fontsize=16)
plt.xticks(rotation=90)
plt.yticks(rotation=0)
plt.tight_layout()
plt.show()
INFY and TCS have a high positive correlation (0.71), which indicates that they tend to move in the same direction. HDFCBANK has a moderate positive correlation with RELIANCE (0.37) and a low correlation with INFY (0.17) and TCS (0.10). RELIANCE shows a low correlation with INFY (0.19) and TCS (0.13). These varying correlations suggest potential diversification benefits; combining stocks with lower correlations can reduce overall portfolio risk.
Now, using Modern Portfolio Theory, we can construct an efficient portfolio by balancing risk and return. We will:
1.Calculate the expected returns and volatility for each stock.
2.Generate a series of random portfolios to identify the efficient frontier.
3.Optimize the portfolio to maximize the Sharpe ratio, which is a measure of risk-adjusted return.
calculate the expected returns and volatility for each stock:
import numpy as np
expected_returns = daily_returns.mean() * 252 # annualize the returns
volatility = daily_returns.std() * np.sqrt(252) # annualize the volatility
stock_stats = pd.DataFrame({
'Expected Return': expected_returns,
'Volatility': volatility
})
stock_stats
RELIANCE has the highest expected return (29.73%) and moderate volatility (21.47%), which indicates a potentially high-reward investment with relatively higher risk. INFY and TCS also have high expected returns (21.38% and 22.09% respectively) with moderate volatility (23.23% and 19.69%). HDFCBANK has the lowest expected return (1.37%) and moderate volatility (20.69%), which makes it the least attractive in terms of risk-adjusted returns.
Next, we will:
Let’s generate the random portfolios and plot the efficient frontier:
# function to calculate portfolio performance
def portfolio_performance(weights, returns, cov_matrix):
portfolio_return = np.dot(weights, returns)
portfolio_volatility = np.sqrt(np.dot(weights.T, np.dot(cov_matrix, weights)))
return portfolio_return, portfolio_volatility
# number of portfolios to simulate
num_portfolios = 10000
# arrays to store the results
results = np.zeros((3, num_portfolios))
# annualized covariance matrix
cov_matrix = daily_returns.cov() * 252
np.random.seed(42)
for i in range(num_portfolios):
weights = np.random.random(len(unique_tickers))
weights /= np.sum(weights)
portfolio_return, portfolio_volatility = portfolio_performance(weights, expected_returns, cov_matrix)
results[0,i] = portfolio_return
results[1,i] = portfolio_volatility
results[2,i] = portfolio_return / portfolio_volatility # Sharpe Ratio
plt.figure(figsize=(10, 7))
plt.scatter(results[1,:], results[0,:], c=results[2,:], cmap='YlGnBu', marker='o')
plt.title('Efficient Frontier')
plt.xlabel('Volatility (Standard Deviation)')
plt.ylabel('Expected Return')
plt.colorbar(label='Sharpe Ratio')
plt.grid(True)
plt.show()
Each dot represents a portfolio, with the colour indicating the Sharpe ratio, a measure of risk-adjusted return. Portfolios on the leftmost edge of the frontier (closer to the y-axis) offer the highest expected returns for a given level of volatility, which represent optimal portfolios. The gradient shows that portfolios with higher Sharpe ratios (darker blue) provide better risk-adjusted returns.
Here’s how to identify the portfolio with the maximum Sharpe ratio:
max_sharpe_idx = np.argmax(results[2])
max_sharpe_return = results[0, max_sharpe_idx]
max_sharpe_volatility = results[1, max_sharpe_idx]
max_sharpe_ratio = results[2, max_sharpe_idx]
max_sharpe_return, max_sharpe_volatility, max_sharpe_ratio
The portfolio with the maximum Sharpe ratio has the following characteristics:
Expected Return: ~26.08%
Volatility: ~15.54%
Sharpe Ratio: ~1.68
Next, let’s identify the weights of the stocks in the portfolio that yield the maximum Sharpe ratio:
max_sharpe_weights = np.zeros(len(unique_tickers))
for i in range(num_portfolios):
weights = np.random.random(len(unique_tickers))
weights /= np.sum(weights)
portfolio_return, portfolio_volatility = portfolio_performance(weights, expected_returns, cov_matrix)
if results[2, i] == max_sharpe_ratio:
max_sharpe_weights = weights
break
portfolio_weights_df = pd.DataFrame({
'Ticker': unique_tickers,
'Weight': max_sharpe_weights
})
portfolio_weights_df
The output shows a diversified portfolio with the following allocations:
HDFCBANK (30.85%)
INFY (10.59%)
RELIANCE (18.02%)
and TCS (40.53%).
TCS has the highest allocation, which indicates its significant contribution to the portfolio’s performance, while INFY has the smallest allocation. This balanced allocation aims to maximize returns while minimizing risk by leveraging individual stock performances and their correlations.
So, this is how stock market portfolio optimization works. Stock market portfolio optimization involves analyzing price trends, calculating expected returns and volatilities, and determining the correlations between different stocks to achieve diversification.