Overview
This project was developed as the final work for the Programming Technologies degree. It consists of developing an algorithmic trading strategy using unsupervised learning techniques (K-Means) to analyze and select S&P500 assets, optimizing portfolios and comparing their performance against the index.Project Context
Final project for the Programming Technician degree, focused on practical application of Machine Learning and Big Data in finance.
Objectives
The project aimed to accomplish the following goals:Clustering & Optimization
Group similar assets using K-Means and build optimal portfolios (max Sharpe ratio)
Technologies & Libraries
The project leverages a comprehensive Python stack for financial analysis and machine learning:Technology Stack
Python
Core programming language
Pandas & NumPy
Data manipulation and numerical computing
yfinance
Financial data acquisition
scikit-learn
Machine learning algorithms (K-Means)
PyPortfolioOpt
Portfolio optimization
pandas_ta
Technical analysis indicators
Matplotlib
Data visualization
Jupyter Notebook
Interactive development environment
Methodology
The project follows a systematic approach to algorithmic trading:1. Data Download & Cleaning
Historical stock prices from the S&P500 were obtained and technical indicators were calculated:Technical Indicators
Technical Indicators
- Volatility: Measure of price variability
- RSI (Relative Strength Index): Momentum oscillator
- Bollinger Bands: Volatility bands
- ATR (Average True Range): Volatility indicator
- MACD (Moving Average Convergence Divergence): Trend-following momentum indicator
- Volume in Dollars: Trading activity measure
2. Liquidity Filtering
The 150 most liquid assets per month were selected to ensure realistic trading operations and minimize slippage.Liquidity filtering is crucial for algorithmic trading strategies to ensure that positions can be entered and exited without significant market impact.
3. Returns Calculation
Monthly returns were calculated for different time horizons, with outlier control to prevent skewed results.4. Risk Factors
Fama-French factors were downloaded and rolling betas were calculated for each asset to capture systematic risk exposure.The Fama-French factors (Market, Size, Value) provide a multi-factor model for understanding stock returns beyond simple market beta.
5. Clustering & Optimization
K-Means clustering was applied to group similar assets based on their characteristics, and optimal portfolios were built using the efficient frontier to maximize the Sharpe ratio.Key Steps
6. Results Comparison
The optimized portfolio performance was compared against the S&P500 index to evaluate the strategy’s effectiveness.Results

The strategy successfully identified optimal asset combinations that provided risk-adjusted returns competitive with the benchmark index.
Project Resources
Access the complete project materials:View Jupyter Notebook
Interactive notebook with full analysis
GitHub Repository
Complete project report and code
Key Takeaways
Machine Learning in Finance
Machine Learning in Finance
Unsupervised learning techniques like K-Means can effectively identify patterns in financial data and group assets with similar characteristics.
Portfolio Optimization
Portfolio Optimization
Modern portfolio theory combined with algorithmic selection can produce portfolios with attractive risk-adjusted returns.
Practical Implementation
Practical Implementation
The project demonstrates a complete pipeline from data acquisition to strategy evaluation, providing a realistic framework for algorithmic trading.