C++ systems programming begins with a tight grip on the language’s lowest-level primitives. Before writing a single CUDA kernel or allocating a GPU buffer, you need to understand how C++ models memory through its type system, how it controls I/O, how arrays and loops work at the hardware level, and how to split code across compilation units using header files. This page walks through each of those fundamentals using the small, focused programs from theDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/VrajPatel105/cpp-gpu-inference/llms.txt
Use this file to discover all available pages before exploring further.
1. cpp core module of this project — the Week 1–2 foundation work that every GPU and transformer concept later in the course builds on.
I/O: cin, cout, and getline
C++ I/O uses stream objects from <iostream>. cout writes to stdout, cin reads a single whitespace-delimited token, and getline reads an entire line including spaces.
getline is required whenever the input may contain spaces. Using cin >> my_fav_color would stop reading at the first space character.Integer Types and Sizes
C++ guarantees minimum bit widths for its primitive types, but the exact size is platform-dependent. Usesizeof (multiplied by 8) to inspect the actual bit width on your machine. This matters in GPU kernel code where tensor element widths — float16, int8, int32 — must match exactly.
Fixed-Width Types
Prefer
<cstdint> types like int8_t, int32_t, and uint64_t when the
bit width must be exact — essential for matching CUDA tensor element types.Platform Variation
long int is 32 bits on Windows (MSVC/MinGW) but 64 bits on Linux/macOS.
long long int is reliably 64 bits everywhere.Arrays and Iteration
C-style arrays allocate a contiguous block of memory on the stack. They are the conceptual ancestor of GPU device buffers: a flat sequence of elements at a known base address. Elements not explicitly initialized hold garbage values.Looping Over Arrays
C++ provides three loop forms for arrays. The range-basedfor loop (C++11 and later) is the most readable for sequential access.
Functions
Functions in C++ must be declared before they are called (or forward-declared). Avoid function returns nothing; any other return type must match the return expression. Parameters are passed by value by default — the function receives a copy.
Declare or define before use
Place function definitions above
main, or use a forward declaration
(int addNums(int a, int b);) at the top of the file.Match return type
The return type in the signature must match what the function actually
returns. Returning nothing from a non-void function is undefined behavior.
Header Files and Include Guards
When a project grows beyond one file, declarations move into header files (.h) and definitions stay in .cpp files. The #ifndef / #define / #endif pattern — the include guard — prevents the header from being processed more than once if included by multiple translation units.
- adder.h
- header.cpp
Modern C++ projects often use
#pragma once as a simpler alternative to
manual include guards. Both prevent double-inclusion; #pragma once is a
compiler extension supported by GCC, Clang, and MSVC.Structs and const Fields
A struct groups related data under a single type. C++ distinguishes between const int (the integer value is immutable) and const char* (the pointer points to immutable data, but the pointer itself can be reassigned). This distinction appears frequently in ML code when labeling or describing tensor metadata.
const int uId
The integer value itself cannot be changed after construction. Attempting
vraj.uId = 999 will not compile.const char* name
The characters the pointer references are const, but
name can be pointed
at a different string literal. Use char* const name to make the pointer
itself immutable.Compiling Single Files
Each source file in the1. cpp core module is a standalone program. Compile and run any single file directly with g++: