Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/dronabopche/100-ML-AI-Project/llms.txt

Use this file to discover all available pages before exploring further.

Generative AI projects move beyond classification into producing text. This section covers four projects that form a natural progression: you start by predicting the next character or word, scale up to full sequence generation with recurrent networks, build a deterministic autocomplete engine using a prefix tree, and finally combine retrieval with generation in a RAG pipeline. Each project teaches a distinct modeling paradigm and, together, they cover the core ideas behind modern language model applications.
What is RAG and why does it matter?Retrieval-Augmented Generation (RAG) solves a fundamental limitation of generative models: their knowledge is frozen at training time. A RAG pipeline retrieves relevant passages from an external document store at inference time and injects them into the prompt before the model generates a response. This means the model can answer questions about documents it never saw during training — without any fine-tuning. The vector store (usually backed by embeddings + approximate nearest-neighbor search) does the heavy lifting of finding semantically relevant context, and the language model focuses on synthesizing a coherent answer from that context.

Projects at a glance

ProjectParadigmCore techniqueKey artifact
Next Token Prediction (50)Statistical / neural LMN-gram, character-level RNNProbability distribution over vocabulary
Text Generator (51)Sequence-to-sequenceLSTM with teacher forcingGenerated text sequences
Prefix Tree Autocomplete (52)Deterministic data structureTrie + frequency rankingSorted completion candidates
RAG Injection Research Pipeline (45)Retrieval + generationEmbeddings + vector DB + LLMGrounded natural language answers
Goal: Given a sequence of characters or words, predict the most likely next token. This is the foundational training objective behind all autoregressive language models.How it works:A character-level or word-level recurrent model is trained with a sliding window over the input corpus. At each step, the model receives the previous seq_len tokens and predicts a probability distribution over the vocabulary. Cross-entropy loss drives the model to assign high probability to the true next token.Character-level model (Keras):
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Embedding

# Build character vocabulary
text = open("corpus.txt").read().lower()
chars = sorted(set(text))
char2idx = {c: i for i, c in enumerate(chars)}
idx2char = {i: c for c, i in char2idx.items()}
VOCAB_SIZE = len(chars)

# Create sliding-window sequences
SEQ_LEN = 40
step = 3
sequences, next_chars = [], []
for i in range(0, len(text) - SEQ_LEN, step):
    sequences.append([char2idx[c] for c in text[i:i + SEQ_LEN]])
    next_chars.append(char2idx[text[i + SEQ_LEN]])

X = np.array(sequences)             # (n_seq, SEQ_LEN)
y = np.array(next_chars)            # (n_seq,)

# Model
model = Sequential([
    Embedding(VOCAB_SIZE, 64, input_length=SEQ_LEN),
    LSTM(256, return_sequences=False),
    Dense(VOCAB_SIZE, activation="softmax"),
])
model.compile(optimizer="adam", loss="sparse_categorical_crossentropy")
model.fit(X, y, batch_size=128, epochs=30)
Sampling with temperature:
def sample(preds: np.ndarray, temperature: float = 1.0) -> int:
    preds = np.log(preds + 1e-8) / temperature
    preds = np.exp(preds) / np.sum(np.exp(preds))
    return np.random.choice(len(preds), p=preds)
Lower temperature (0.2–0.5) produces more conservative, repetitive output. Higher temperature (0.8–1.2) yields more creative but less coherent text.
Goal: Generate coherent multi-sentence text by training an LSTM (Long Short-Term Memory) network on a domain corpus using teacher forcing.How it works:Unlike next-token prediction which predicts one token at a time during evaluation, the Text Generator is designed to unroll multiple generation steps, maintaining hidden state across them. Teacher forcing — passing the ground-truth token at each training step rather than the model’s own prediction — stabilizes training with LSTMs.Word-level LSTM generator:
import torch
import torch.nn as nn

class TextGeneratorLSTM(nn.Module):
    def __init__(self, vocab_size: int, embed_dim: int, hidden_dim: int, n_layers: int):
        super().__init__()
        self.embedding = nn.Embedding(vocab_size, embed_dim, padding_idx=0)
        self.lstm = nn.LSTM(embed_dim, hidden_dim, n_layers,
                            batch_first=True, dropout=0.3)
        self.fc = nn.Linear(hidden_dim, vocab_size)

    def forward(self, x, hidden=None):
        emb = self.embedding(x)                   # (batch, seq, embed_dim)
        out, hidden = self.lstm(emb, hidden)      # (batch, seq, hidden_dim)
        logits = self.fc(out)                     # (batch, seq, vocab_size)
        return logits, hidden

# Generation loop
def generate(model, seed_tokens, idx2word, word2idx, n_words=50, temperature=0.8):
    model.eval()
    tokens = torch.tensor([seed_tokens])
    hidden = None
    generated = list(seed_tokens)

    with torch.no_grad():
        for _ in range(n_words):
            logits, hidden = model(tokens, hidden)
            probs = torch.softmax(logits[:, -1, :] / temperature, dim=-1)
            next_token = torch.multinomial(probs, 1).item()
            generated.append(next_token)
            tokens = torch.tensor([[next_token]])

    return " ".join(idx2word[t] for t in generated)
Training tip: Use gradient clipping (torch.nn.utils.clip_grad_norm_) with a max norm of 5.0 to prevent exploding gradients, which are common in deep LSTMs on long sequences.
Goal: Given a partial string prefix, return a ranked list of completion candidates in sub-millisecond time — without any neural network.How it works:A trie (prefix tree) stores all known words or phrases. Each node represents one character. Insertion is O(k) where k is the word length; prefix lookup is also O(k) and returns all completions reachable from the prefix node. Completions are ranked by insertion frequency so the most common completions surface first.
from collections import defaultdict
from dataclasses import dataclass, field
from typing import Optional

@dataclass
class TrieNode:
    children: dict = field(default_factory=dict)
    is_end: bool = False
    frequency: int = 0

class Trie:
    def __init__(self):
        self.root = TrieNode()

    def insert(self, word: str, frequency: int = 1) -> None:
        node = self.root
        for char in word:
            if char not in node.children:
                node.children[char] = TrieNode()
            node = node.children[char]
        node.is_end = True
        node.frequency += frequency

    def _collect(self, node: TrieNode, prefix: str, results: list) -> None:
        if node.is_end:
            results.append((prefix, node.frequency))
        for char, child in node.children.items():
            self._collect(child, prefix + char, results)

    def autocomplete(self, prefix: str, top_n: int = 5) -> list[str]:
        node = self.root
        for char in prefix:
            if char not in node.children:
                return []
            node = node.children[char]
        results = []
        self._collect(node, prefix, results)
        results.sort(key=lambda x: -x[1])        # sort by frequency descending
        return [word for word, _ in results[:top_n]]

# Example usage
trie = Trie()
for word, freq in [("python", 120), ("pytorch", 95), ("pandas", 88), ("pickle", 40)]:
    trie.insert(word, freq)

print(trie.autocomplete("py"))   # ['python', 'pytorch']
print(trie.autocomplete("pa"))   # ['pandas']
When to use a trie over a neural autocomplete: Tries are deterministic, explainable, and extremely fast. They are the right choice when you have a fixed vocabulary (e.g., product names, command completions) and need guaranteed latency. Neural models are better when the completion space is open-ended and semantic similarity matters more than exact prefix matching.
Goal: Answer factual questions about a document corpus by retrieving relevant passages at query time and injecting them as context into a language model prompt.How it works:The pipeline has two phases:
  1. Ingestion — documents are split into overlapping chunks, embedded with a sentence transformer, and stored in a vector database (ChromaDB in this project, as evidenced by the vector_store/chroma.sqlite3 artifact).
  2. Query — the user’s question is embedded with the same model, the top-k most similar chunks are retrieved from the vector store, and they are concatenated into the prompt before the LLM generates an answer.
from sentence_transformers import SentenceTransformer
import chromadb

EMBED_MODEL = "all-MiniLM-L6-v2"
embedder = SentenceTransformer(EMBED_MODEL)

# --- Ingestion ---
client = chromadb.PersistentClient(path="vector_store/")
collection = client.get_or_create_collection("research_docs")

def ingest_documents(docs: list[dict]) -> None:
    """docs: list of {"id": str, "text": str, "source": str}"""
    texts = [d["text"] for d in docs]
    embeddings = embedder.encode(texts, normalize_embeddings=True).tolist()
    collection.add(
        ids=[d["id"] for d in docs],
        embeddings=embeddings,
        documents=texts,
        metadatas=[{"source": d["source"]} for d in docs],
    )

# --- Retrieval ---
def retrieve(query: str, top_k: int = 4) -> list[str]:
    q_emb = embedder.encode([query], normalize_embeddings=True).tolist()
    results = collection.query(query_embeddings=q_emb, n_results=top_k)
    return results["documents"][0]   # list of passage strings

# --- Generation (inject context into prompt) ---
def rag_answer(query: str, llm_generate_fn) -> str:
    passages = retrieve(query)
    context = "\n\n".join(f"[{i+1}] {p}" for i, p in enumerate(passages))
    prompt = (
        f"Use the following research passages to answer the question.\n\n"
        f"Context:\n{context}\n\n"
        f"Question: {query}\n\nAnswer:"
    )
    return llm_generate_fn(prompt)
Vector store: This project persists embeddings in ChromaDB (vector_store/chroma.sqlite3). The collection is reloaded across sessions with chromadb.PersistentClient, so ingestion only needs to happen once.Chunk size matters: Chunks that are too small miss context; chunks that are too large dilute relevance. A chunk size of 256–512 tokens with a 50-token overlap is a good starting point for research papers.

Build docs developers (and LLMs) love