Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/dronabopche/100-ML-AI-Project/llms.txt

Use this file to discover all available pages before exploring further.

The 100-ML-AI-Project repository spans supervised regression, classification, NLP, computer vision, reinforcement learning, and generative AI. Each project ships with the dataset it was trained on, stored in a Dataset/ subdirectory alongside the model and source code. Datasets are sourced from three platforms, each suited to a different problem domain. Understanding where datasets come from and how to access them is the first step toward reproducing any project from scratch.

Data source platforms

PlatformUse CaseExample DatasetsAccess Method
KaggleTabular regression and classification (structured data)Housing prices, employee churn, Titanic survival, medical costsKaggle API (kaggle datasets download) or browser download
Hugging Face DatasetsNLP tasks: text classification, sentiment, emotion detectiondair-ai/emotion, sentiment analysis corporadatasets Python library (load_dataset)
Public ML RepositoriesImage benchmarks and vision tasksCIFAR-10, MNIST, food image collectionsDirect download URLs, torchvision.datasets, tensorflow.keras.datasets
Most Kaggle datasets require a free Kaggle account and acceptance of the dataset’s individual license terms before download. Some competition datasets additionally require joining the specific competition. Check each dataset’s license before using it in any published work. Hugging Face datasets are generally available under open licenses (Apache 2.0, CC BY 4.0) but vary by dataset — always review the dataset card.

Datasets by project category

Supervised Learning (projects 01–29)

Supervised projects use tabular CSV datasets from Kaggle. Each file contains labeled rows suitable for regression or binary/multi-class classification.
ProjectDatasetProblem type
01 House Price PredictionHousePricePrediction.csv (Kaggle)Regression (SalePrice)
02 Employee Retention PredictionEmployee churn dataset (Kaggle)Classification
04 Medical Cost PredictionMedical insurance costs (Kaggle)Regression
05 Titanic Survival PredictionTitanic passenger manifest (Kaggle)Binary classification
06 Email Spam ClassificationEmail corpus with spam labels (Kaggle)Binary classification
10 Used Car Price PredictionUsed car listings (Kaggle)Regression
11 Mobile Price Range PredictionMobile phone specs (Kaggle)Multi-class classification
13 Hotel Booking CancellationHotel booking records (Kaggle)Binary classification
14 Crop Yield PredictionAgricultural yield data (Kaggle)Regression
26 Credit Loan ApprovalLoan application records (Kaggle)Binary classification

Natural Language Processing (projects 16, 40–41)

NLP projects load datasets from Hugging Face, which provides versioned, pre-split datasets accessible through the datasets library without any account requirement.
ProjectDatasetSource
16 Text Emotion Detectiondair-ai/emotion (6-class emotion labels)Hugging Face
40 Resume Keyword ExtractorResume text corpusHugging Face / custom
41 Sentiment AnalysisSentiment-labeled review datasetHugging Face

Computer Vision & Deep Learning (projects 12, 30–34)

Vision projects use standard image benchmark datasets. Many are available directly through deep learning framework dataset utilities without a manual download step.
ProjectDatasetSource
30 Binary Image ClassificationCustom binary image datasetPublic repository
31 Food Image ClassificationFood-101 or similar food image datasetPublic repository
32 CIFAR-10 ClassificationCIFAR-10 (60,000 32×32 colour images, 10 classes)tensorflow.keras.datasets / torchvision
33 MNIST Digit ClassificationMNIST (70,000 28×28 greyscale digit images)tensorflow.keras.datasets / torchvision
12 Date Fruit ClassificationDate fruit image datasetPublic repository

Downloading from Kaggle API

Install the Kaggle CLI and place your kaggle.json API token in ~/.kaggle/ before running any download command.
# Install the Kaggle CLI
pip install kaggle

# Place your API token (download from kaggle.com → Account → API)
mkdir -p ~/.kaggle
cp kaggle.json ~/.kaggle/
chmod 600 ~/.kaggle/kaggle.json

# Download the House Price Prediction dataset
kaggle datasets download -d competitions/house-prices-advanced-regression-techniques

# Unzip into the project's Dataset directory
unzip house-prices-advanced-regression-techniques.zip \
  -d ML_To_Train/01_House_Price_Predict/dataset/

Loading from Hugging Face

from datasets import load_dataset

# Load the emotion dataset used in Project 16
dataset = load_dataset("dair-ai/emotion")

# Access splits
train_data = dataset["train"]
test_data  = dataset["test"]

print(train_data.features)
# {'text': Value('string'), 'label': ClassLabel(names=['sadness', 'joy', ...])}

Loading vision datasets via framework utilities

# MNIST via TensorFlow / Keras
import tensorflow as tf

(X_train, y_train), (X_test, y_test) = tf.keras.datasets.mnist.load_data()

# CIFAR-10 via PyTorch
import torchvision
import torchvision.transforms as transforms

trainset = torchvision.datasets.CIFAR10(
    root="./data", train=True, download=True,
    transform=transforms.ToTensor()
)

Build docs developers (and LLMs) love