Skip to main content

Overview

This project was developed as the final work for the Programming Technologies degree. It consists of developing an algorithmic trading strategy using unsupervised learning techniques (K-Means) to analyze and select S&P500 assets, optimizing portfolios and comparing their performance against the index.

Project Context

Final project for the Programming Technician degree, focused on practical application of Machine Learning and Big Data in finance.

Objectives

The project aimed to accomplish the following goals:
1

Data Collection

Download and process historical price data from the S&P500
2

Technical Indicators

Calculate technical indicators and relevant features for each stock
3

Liquidity Selection

Select the 150 most liquid assets each month
4

Returns Calculation

Calculate monthly returns for different time horizons
5

Risk Factors

Download Fama-French factors and calculate rolling betas
6

Clustering & Optimization

Group similar assets using K-Means and build optimal portfolios (max Sharpe ratio)
7

Performance Comparison

Compare portfolio performance against the S&P500

Technologies & Libraries

The project leverages a comprehensive Python stack for financial analysis and machine learning:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

Technology Stack

Python

Core programming language

Pandas & NumPy

Data manipulation and numerical computing

yfinance

Financial data acquisition

scikit-learn

Machine learning algorithms (K-Means)

PyPortfolioOpt

Portfolio optimization

pandas_ta

Technical analysis indicators

Matplotlib

Data visualization

Jupyter Notebook

Interactive development environment

Methodology

The project follows a systematic approach to algorithmic trading:

1. Data Download & Cleaning

Historical stock prices from the S&P500 were obtained and technical indicators were calculated:
  • Volatility: Measure of price variability
  • RSI (Relative Strength Index): Momentum oscillator
  • Bollinger Bands: Volatility bands
  • ATR (Average True Range): Volatility indicator
  • MACD (Moving Average Convergence Divergence): Trend-following momentum indicator
  • Volume in Dollars: Trading activity measure

2. Liquidity Filtering

The 150 most liquid assets per month were selected to ensure realistic trading operations and minimize slippage.
Liquidity filtering is crucial for algorithmic trading strategies to ensure that positions can be entered and exited without significant market impact.

3. Returns Calculation

Monthly returns were calculated for different time horizons, with outlier control to prevent skewed results.

4. Risk Factors

Fama-French factors were downloaded and rolling betas were calculated for each asset to capture systematic risk exposure.
The Fama-French factors (Market, Size, Value) provide a multi-factor model for understanding stock returns beyond simple market beta.

5. Clustering & Optimization

K-Means clustering was applied to group similar assets based on their characteristics, and optimal portfolios were built using the efficient frontier to maximize the Sharpe ratio.
Key Steps
# Cluster assets with similar characteristics
kmeans = KMeans(n_clusters=k)
clusters = kmeans.fit_predict(scaled_features)

# Build optimal portfolio (max Sharpe ratio)
ef = EfficientFrontier(expected_returns, cov_matrix)
weights = ef.max_sharpe()

6. Results Comparison

The optimized portfolio performance was compared against the S&P500 index to evaluate the strategy’s effectiveness.

Results

Portfolio performance vs S&P500
The optimized portfolio achieved competitive results against the S&P500, demonstrating the utility of combining Machine Learning techniques with traditional financial analysis.
The strategy successfully identified optimal asset combinations that provided risk-adjusted returns competitive with the benchmark index.

Project Resources

Access the complete project materials:

View Jupyter Notebook

Interactive notebook with full analysis

GitHub Repository

Complete project report and code

Key Takeaways

Unsupervised learning techniques like K-Means can effectively identify patterns in financial data and group assets with similar characteristics.
Modern portfolio theory combined with algorithmic selection can produce portfolios with attractive risk-adjusted returns.
The project demonstrates a complete pipeline from data acquisition to strategy evaluation, providing a realistic framework for algorithmic trading.
This project showcases how Machine Learning and Big Data techniques can be practically applied to financial markets, providing a foundation for more sophisticated trading strategies.

Build docs developers (and LLMs) love