Consistent directory layout shared by all 100+ ML projects

Consistency is a first-class concern in this repository. Every one of the 100+ projects uses the same top-level directory layout, the same naming conventions for source files, and the same separation between raw data, trained artifacts, and application logic. Once you understand the structure of one project, you can immediately navigate any other project in the repo and know exactly where to find the dataset, the trained model, the preprocessing code, and the entry point for the API.

Standard directory layout

Every project follows this exact structure:

Project_Name/
│
├── Dataset/
├── Models/
├── Resources/
├── SRC/
│   ├── Processing/
│   ├── Output/
│   └── App.py
│
├── Project_Notebook.ipynb
├── requirements.txt
└── README.md

Directory and file reference

Directory / File	Description	Contents
`Dataset/`	Stores raw data for training and evaluation	CSV files, image datasets, structured tabular data
`Models/`	Contains trained and serialized models	`.pkl`, `.joblib`, `.h5`, saved pipelines
`Resources/`	Supporting assets for the project	Diagrams, visualization images, documentation files
`SRC/`	Core application logic	Complete ML pipeline implementation
`SRC/App.py`	Entry point of the application	Handles input, preprocessing, model loading, inference, and prediction output
`SRC/Processing/`	Data preprocessing module	Missing value handling, encoding, feature engineering, scaling, transformations
`SRC/Output/`	Output handling module	Prediction results, probability scores, confidence levels, timestamps
`Project_Notebook.ipynb`	Jupyter notebook for experimentation	EDA, model training, evaluation, and result visualization
`requirements.txt`	Python dependency list	All packages needed to run the project
`README.md`	Project documentation	Objectives, dataset details, model summary, deployment notes

Real example: House Price Prediction (Project 01)

The House Price Prediction project illustrates how the standard structure maps to a working ML system. This project trains three regression models (Linear, Ridge, Lasso) on a Kaggle housing dataset and exposes predictions through a Flask API backed by Gemini for natural-language input parsing.

House-Price-Prediction/
│
├── House_Price_Prediction.ipynb
├── README.md
├── requirements.txt
│
├── dataset/
│   └── HousePricePrediction.csv
│
├── model/
│   ├── lasso_model.pkl
│   ├── lr_model.pkl
│   └── ridge_model.pkl
│
├── output/
│   └── predictor.py
│
└── src/
    ├── app.py
    ├── environment.py
    │
    └── preprocessing/
        └── preprocessing.py

The src/app.py file is the Flask entry point. It receives a natural-language prompt from the client, passes it to the preprocessing pipeline (which calls the Gemini API to extract structured feature values), and then runs inference across all three models before averaging the predictions into a final price:

@app.route("/predict", methods=["POST"])
def predict():
    body = request.get_json()
    prompt = body["prompt"]
    features_np = preprocess_prompt(prompt, GEMINI_API_KEY)
    predicted_price = int(predict_price(features_np))
    return jsonify({"predicted_sale_price": predicted_price})

ML pipeline

The diagram below shows the standard learning workflow that each project notebook documents, from raw data through to the trained model stored in Models/.

Get Started

ML From Scratch

Consistent directory layout shared by all 100+ ML projects

Standard directory layout

Directory and file reference

Real example: House Price Prediction (Project 01)

ML pipeline

Build docs developers (and LLMs) love

Get Started

ML From Scratch

Documentation Index

​Standard directory layout

​Directory and file reference

​Real example: House Price Prediction (Project 01)

​ML pipeline

Build docs developers (and LLMs) love

Standard directory layout

Directory and file reference

Real example: House Price Prediction (Project 01)

ML pipeline