Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/VrajPatel105/cpp-gpu-inference/llms.txt

Use this file to discover all available pages before exploring further.

Memory management is the single most consequential skill in systems and ML code. A GPU inference engine allocates large float buffers for weights and activations, passes them between functions, and must free them at exactly the right moment — no sooner (use-after-free), no later (memory leak). C++ gives you two layers of control: raw pointers, which let you address memory directly with no overhead, and smart pointers, which wrap raw pointers in RAII objects that automatically release memory when the owning object goes out of scope. This page covers both layers, plus struct padding and alignment, which determines how the compiler lays struct fields out in memory — a detail that becomes highly relevant when writing memory-efficient tensor descriptors.

Raw Pointers and References

A pointer stores the memory address of another variable. The & operator takes the address of a variable; the * operator dereferences a pointer to read or write the value it points to. %p is the printf format specifier for printing addresses.
#include <iostream>
using namespace std;

int main(){
    int score = 100;
    int *mypointer = &score;   // mypointer holds the address of score

    printf("value of score is %d \n", score);
    printf("Value of pointer is %p \n", mypointer);  // prints a hex address

    int &another_score = score; // reference — an alias for score
    another_score = 200;        // modifies score directly

    printf("first value of score is %d and new value after reference another_score is %d",
           score, score);
    return 0;
}

Pointers vs. References

Pointer (*)

Stores a memory address. Can be null, can be reassigned to point elsewhere, and can be used with pointer arithmetic. Requires explicit * to dereference.

Reference (&)

An alias for an existing variable. Cannot be null, cannot be reseated after initialization, and is accessed with the same syntax as the original variable — no * required.

Struct Padding and Memory Alignment

The compiler inserts invisible padding bytes between struct fields to satisfy alignment requirements. alignof(T) returns the alignment boundary (in bytes) that type T requires. An int (4 bytes) must start at an address divisible by 4; if the preceding field leaves the offset at an odd position, the compiler inserts padding to reach the next aligned boundary.
#include <iostream>

struct A { char c; int i; };   // char (1 byte) + 3 bytes padding + int (4 bytes) = 8
struct B { int i; char c; };   // int (4 bytes) + char (1 byte) + 3 bytes padding = 8

int main() {
    std::cout << sizeof(A) << " " << sizeof(B) << "\n";       // 8  8
    std::cout << alignof(int) << " " << alignof(char) << "\n"; // 4  1
}
Both struct A and struct B are 8 bytes here, but the padding lands in different positions. In a struct with more varied field types, reordering fields from largest to smallest alignment can eliminate padding entirely and reduce total struct size.
When defining tensor descriptor structs (shape, stride, dtype, device), sort fields by decreasing alignment — int64_t (8), then int32_t (4), then int16_t (2), then int8_t / bool (1). This eliminates hidden padding and keeps the struct as compact as possible.

Smart Pointers

Smart pointers from <memory> wrap a raw pointer and tie its lifetime to an owning object. When the owner goes out of scope, the destructor is called automatically and the heap allocation is freed. This pattern is called RAII (Resource Acquisition Is Initialization).

unique_ptr — Exclusive Ownership

A unique_ptr is the sole owner of its allocation. Ownership can be transferred with std::move, but not copied. When the unique_ptr goes out of scope, the memory is freed immediately.
#include <iostream>
#include <memory>
using namespace std;

int main() {
    unique_ptr<int> unPtr1 = make_unique<int>(25);

    unique_ptr<int> unPtr2 = move(unPtr1); // transfer ownership

    cout << unPtr2.get() << endl;  // valid address
    cout << unPtr1.get() << endl;  // 0 — unPtr1 no longer owns anything
}
Scope-based deallocation in action with a class:
class Myclass{
public:
    Myclass()  { cout << "-----------constructor invoked-----------" << endl; }
    ~Myclass() { cout << "-----------Destructor invoked-----------"  << endl; }
};

int main() {
    unique_ptr<Myclass> unPtr1 = make_unique<Myclass>();
    {
        unique_ptr<Myclass> unPtr2 = make_unique<Myclass>();
    } // unPtr2's destructor fires here — before main() returns
}

shared_ptr — Shared Ownership with Reference Counting

Multiple shared_ptr instances can share ownership of the same object. An internal reference counter tracks how many owners exist; the object is destroyed when the counter reaches zero. use_count() lets you inspect the current count.
shared_ptr<Myclass> shPtr1 = make_shared<Myclass>();
cout << "Shared count : " << shPtr1.use_count() << endl;  // 1

shared_ptr<Myclass> shPtr2 = shPtr1;
cout << "Shared count : " << shPtr2.use_count() << endl;  // 2

{
    shared_ptr<Myclass> shPtr3 = shPtr1;
    cout << "Shared count : " << shPtr3.use_count() << endl;  // 3
}  // shPtr3 goes out of scope, count drops to 2

cout << "Shared count : " << shPtr1.use_count() << endl;  // 2

weak_ptr — Non-Owning Observer

A weak_ptr observes a shared_ptr-managed object without contributing to the reference count. When the last shared_ptr releases ownership, the object is destroyed even if weak_ptr instances still exist — those pointers become dangling and can be detected via expired().
weak_ptr<int> wePtr1;
{
    shared_ptr<int> shPtr1 = make_shared<int>(10);
    wePtr1 = shPtr1;
}
// shPtr1 has gone out of scope; the int is already destroyed
// wePtr1.expired() == true here
Why smart pointers matter for ML code. A neural network layer may allocate hundreds of megabytes for weight matrices. With raw pointers, an early return or a thrown exception can skip the delete call, leaking the entire allocation. Smart pointers make memory safety automatic — the destructor runs regardless of how the function exits, whether normally or via exception. This is RAII: tying resource lifetime to object lifetime so the compiler enforces cleanup for you.

Quick Reference

// Create
auto ptr = make_unique<T>(args...);

// Access
*ptr;         // dereference
ptr.get();    // raw address (still owned by unique_ptr)

// Transfer ownership
auto ptr2 = move(ptr);  // ptr is now null

Build docs developers (and LLMs) love