Generative AI projects move beyond classification into producing text. This section covers four projects that form a natural progression: you start by predicting the next character or word, scale up to full sequence generation with recurrent networks, build a deterministic autocomplete engine using a prefix tree, and finally combine retrieval with generation in a RAG pipeline. Each project teaches a distinct modeling paradigm and, together, they cover the core ideas behind modern language model applications.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/dronabopche/100-ML-AI-Project/llms.txt
Use this file to discover all available pages before exploring further.
What is RAG and why does it matter?Retrieval-Augmented Generation (RAG) solves a fundamental limitation of generative models: their knowledge is frozen at training time. A RAG pipeline retrieves relevant passages from an external document store at inference time and injects them into the prompt before the model generates a response. This means the model can answer questions about documents it never saw during training — without any fine-tuning. The vector store (usually backed by embeddings + approximate nearest-neighbor search) does the heavy lifting of finding semantically relevant context, and the language model focuses on synthesizing a coherent answer from that context.
Projects at a glance
| Project | Paradigm | Core technique | Key artifact |
|---|---|---|---|
| Next Token Prediction (50) | Statistical / neural LM | N-gram, character-level RNN | Probability distribution over vocabulary |
| Text Generator (51) | Sequence-to-sequence | LSTM with teacher forcing | Generated text sequences |
| Prefix Tree Autocomplete (52) | Deterministic data structure | Trie + frequency ranking | Sorted completion candidates |
| RAG Injection Research Pipeline (45) | Retrieval + generation | Embeddings + vector DB + LLM | Grounded natural language answers |
Next Token Prediction (Project 50)
Next Token Prediction (Project 50)
Goal: Given a sequence of characters or words, predict the most likely next token. This is the foundational training objective behind all autoregressive language models.How it works:A character-level or word-level recurrent model is trained with a sliding window over the input corpus. At each step, the model receives the previous Sampling with temperature:Lower temperature (0.2–0.5) produces more conservative, repetitive output. Higher temperature (0.8–1.2) yields more creative but less coherent text.
seq_len tokens and predicts a probability distribution over the vocabulary. Cross-entropy loss drives the model to assign high probability to the true next token.Character-level model (Keras):Text Generator (Project 51)
Text Generator (Project 51)
Goal: Generate coherent multi-sentence text by training an LSTM (Long Short-Term Memory) network on a domain corpus using teacher forcing.How it works:Unlike next-token prediction which predicts one token at a time during evaluation, the Text Generator is designed to unroll multiple generation steps, maintaining hidden state across them. Teacher forcing — passing the ground-truth token at each training step rather than the model’s own prediction — stabilizes training with LSTMs.Word-level LSTM generator:Training tip: Use gradient clipping (
torch.nn.utils.clip_grad_norm_) with a max norm of 5.0 to prevent exploding gradients, which are common in deep LSTMs on long sequences.Prefix Tree Autocomplete Engine (Project 52)
Prefix Tree Autocomplete Engine (Project 52)
Goal: Given a partial string prefix, return a ranked list of completion candidates in sub-millisecond time — without any neural network.How it works:A trie (prefix tree) stores all known words or phrases. Each node represents one character. Insertion is O(k) where k is the word length; prefix lookup is also O(k) and returns all completions reachable from the prefix node. Completions are ranked by insertion frequency so the most common completions surface first.When to use a trie over a neural autocomplete: Tries are deterministic, explainable, and extremely fast. They are the right choice when you have a fixed vocabulary (e.g., product names, command completions) and need guaranteed latency. Neural models are better when the completion space is open-ended and semantic similarity matters more than exact prefix matching.
RAG Injection Research Pipeline (Project 45)
RAG Injection Research Pipeline (Project 45)
Goal: Answer factual questions about a document corpus by retrieving relevant passages at query time and injecting them as context into a language model prompt.How it works:The pipeline has two phases:Vector store: This project persists embeddings in ChromaDB (
- Ingestion — documents are split into overlapping chunks, embedded with a sentence transformer, and stored in a vector database (ChromaDB in this project, as evidenced by the
vector_store/chroma.sqlite3artifact). - Query — the user’s question is embedded with the same model, the top-k most similar chunks are retrieved from the vector store, and they are concatenated into the prompt before the LLM generates an answer.
vector_store/chroma.sqlite3). The collection is reloaded across sessions with chromadb.PersistentClient, so ingestion only needs to happen once.Chunk size matters: Chunks that are too small miss context; chunks that are too large dilute relevance. A chunk size of 256–512 tokens with a 50-token overlap is a good starting point for research papers.