String Obfuscation

Why String Obfuscation?

Shellcode often needs to hide its intent from static analysis tools. Storing API names and module names in plaintext makes your shellcode easily detectable:

// ❌ Visible in any hex editor or strings tool
LoadLibraryA("suspicious.dll");
GetProcAddress(handle, "CreateRemoteThread");

Signature-based antivirus and EDR products scan for these patterns. String obfuscation solves this by:

Removing plaintext: No literal “LoadLibraryA” in the binary
Hash-based lookup: Use numeric hashes instead of strings
Compile-time computation: Zero runtime overhead
Reduces size: Hash values (4 bytes) vs full strings

Compile-Time FNV1a Hashing

Stardust uses the FNV1a (Fowler-Noll-Vo) hash algorithm, computed entirely at compile time using C++20 consteval.

The Hash Function

From include/constexpr.h:20-39:

template <typename T = char>
consteval auto hash_string(
    const T* string
) -> uint32_t {
    uint32_t hash = 0x811c9dc5;  // FNV offset basis
    uint8_t  byte = 0;

    while ( * string ) {
        byte = static_cast<uint8_t>( * string++ );

        if ( byte >= 'a' ) {
            byte -= 0x20;  // Convert to uppercase
        }

        hash ^= byte;
        hash *= 0x01000193;  // FNV prime
    }

    return hash;
}

Algorithm Breakdown

FNV1a Algorithm:

Start with offset basis: 0x811c9dc5
For each byte:
- XOR hash with byte
- Multiply hash by FNV prime 0x01000193
Return 32-bit hash

Case Insensitive:

if ( byte >= 'a' ) {
    byte -= 0x20;  // 'a' -> 'A'
}

This ensures "LoadLibraryA" and "loadlibrarya" hash to the same value, matching Windows case-insensitive API resolution.

consteval Keyword

The consteval specifier (C++20) forces compile-time evaluation:

consteval auto hash_string(...) -> uint32_t

Key properties:

Must be evaluated at compile time (not runtime)
Produces a constant expression
If it can’t be evaluated at compile time, compilation fails

This guarantees zero runtime cost - the hash is already computed and embedded as a constant.

Template for String Types

The template supports both narrow and wide strings:

template <typename T = char>  // T can be char or wchar_t

Usage:

// Narrow strings (char)
auto hash1 = expr::hash_string<char>("LoadLibraryA");
auto hash1 = expr::hash_string("LoadLibraryA");  // char is default

// Wide strings (wchar_t) for module names
auto hash2 = expr::hash_string<wchar_t>(L"kernel32.dll");

Wide strings are needed for module resolution because the PEB stores module names as UNICODE_STRING structures.

How String Hashing Avoids Detection

Before Obfuscation

Traditional approach stores strings in plaintext:

LOADLIBRARY pLoadLibrary = GetProcAddress(kernel32, "LoadLibraryA");

In the binary:

.rdata:00001000 4C 6F 61 64 4C 69 62 72  LoadLibr
.rdata:00001008 61 72 79 41 00           aryA.

A simple strings command reveals:

$ strings shellcode.bin
LoadLibraryA
GetProcAddress
CreateRemoteThread
VirtualAllocEx

After Obfuscation

Stardust’s approach using hashes:

auto api = resolve::api<LOADLIBRARY>(
    kernel32, 
    expr::hash_string("LoadLibraryA")  // Computed at compile time
);

In the binary:

.text:00001050  B8 9C 7A 14 E0  mov eax, 0xE0147A9C  ; Hash of "LoadLibraryA"

No string visible! Only a 32-bit hash value. Running strings finds nothing:

$ strings shellcode.bin
(no API names found)

Static Analysis Perspective

Without obfuscation:

YARA rules can match: "LoadLibraryA" and "GetProcAddress"
Signature detection is trivial
Intent is obvious

With obfuscation:

No string signatures to match
Must analyze behavior to detect
Harder to create static rules

Practical Usage

Module Resolution

Resolving module handles from the PEB (src/main.cc:26-28):

if ( ! (( ntdll.handle = resolve::module( 
    expr::hash_string<wchar_t>( L"ntdll.dll" ) ) )) ) {
    return;
}

What happens:

Compiler evaluates expr::hash_string<wchar_t>(L"ntdll.dll") at compile time
Produces hash value: 0x1EDAB0ED
Embeds 0x1EDAB0ED as a constant in the binary
At runtime, resolve::module() walks the PEB comparing hashes

API Resolution

Two methods for resolving function pointers:

Method 1: RESOLVE_API Macro

From the README example (src/main.cc:59):

decltype( MessageBoxA ) * msgbox = RESOLVE_API( 
    reinterpret_cast<uintptr_t>( user32 ), 
    MessageBoxA 
);

The macro (include/resolve.h:9):

#define RESOLVE_API( m, s ) resolve::api<decltype(s)>( m, expr::hash_string( # s ) )

Expansion:

resolve::api<decltype(MessageBoxA)>( 
    reinterpret_cast<uintptr_t>(user32), 
    expr::hash_string( "MessageBoxA" )  // Compile-time hash
);

Benefits:

Automatically stringifies the function name with # s
Automatically infers the function type with decltype(s)
Clean, readable syntax

Method 2: Direct Call

Manual resolution:

auto msgbox = resolve::api<decltype(MessageBoxA)>(
    user32_handle,
    expr::hash_string("MessageBoxA")
);

Same result, more explicit.

Automatic Import Resolution

The RESOLVE_IMPORT macro automates resolving all APIs in a module structure. From include/macros.h:8-12:

#define RESOLVE_IMPORT( m ) { \
    for ( int i = 1; i < expr::struct_count<decltype( instance::m )>(); i++ ) { \
        reinterpret_cast<uintptr_t*>( &m )[ i ] = \
            resolve::_api( m.handle, reinterpret_cast<uintptr_t*>( &m )[ i ] ); \
    } \
}

Usage in constructor (src/main.cc:38-39):

RESOLVE_IMPORT( ntdll );
RESOLVE_IMPORT( kernel32 );

How it works:

The instance structure stores hashes in place of function pointers initially:

struct {
    uintptr_t handle;
    struct {
        decltype(LoadLibraryA)* LoadLibraryA;    // Initially contains hash
        decltype(GetProcAddress)* GetProcAddress; // Initially contains hash
    };
} kernel32 = {
    RESOLVE_TYPE( LoadLibraryA ),     // Expands to hash
    RESOLVE_TYPE( GetProcAddress )    // Expands to hash
};

RESOLVE_IMPORT iterates through each field (skipping the handle)
Treats the stored value as a hash
Calls resolve::_api() to find the real function address
Replaces the hash with the actual function pointer

After resolution:

kernel32.LoadLibraryA    // Now points to real LoadLibraryA in memory
kernel32.GetProcAddress  // Now points to real GetProcAddress

Hash Collision Considerations

Collision Probability

FNV1a produces 32-bit hashes, giving 2^32 (4,294,967,296) possible values. Birthday paradox: With ~65,536 strings, collision probability reaches ~50%. However, typical shellcode resolves:

~10-20 module names
~50-100 function names

At this scale, collision probability is extremely low (less than 0.01%).

Handling Collisions

If you encounter a collision:

Option 1: Verify Manually

auto api_ptr = resolve::_api(module, hash);
if (api_ptr) {
    // Found something, verify it's correct by testing behavior
    auto result = reinterpret_cast<MY_FUNC*>(api_ptr)(test_args);
}

Option 2: Use Full String Match

Fallback to a runtime string comparison:

auto verify_api(uintptr_t module, uint32_t hash, const char* name) {
    auto addr = resolve::_api(module, hash);
    
    // If collision suspected, verify with actual name
    if (addr) {
        auto actual_name = get_export_name(module, addr);
        if (strcmp(actual_name, name) != 0) {
            return 0;  // Collision detected
        }
    }
    
    return addr;
}

Option 3: Different Hash Algorithm

Implement an alternative hash (e.g., CRC32, xxHash) for colliding strings.

Real-World Risk

In practice, collisions are rare because:

Windows API names are unique by design
Module names don’t overlap
FNV1a distributes well for ASCII strings

You’re more likely to encounter bugs than collisions.

Runtime vs Compile-Time Hashing

Compile-Time (consteval)

From include/constexpr.h:20-39:

consteval auto hash_string(const T* string) -> uint32_t

Advantages:

✅ Zero runtime cost
✅ Hash computed during compilation
✅ Binary contains only the hash value
✅ Smaller code size

Usage: When the string is known at compile time (API names, module names).

Runtime Hashing

Stardust also provides a runtime version (include/common.h:82-100):

template<typename T = char>
inline auto declfn hash_string(
    _In_ const T* string
) -> uint32_t {
    uint32_t hash = 0x811c9dc5;
    uint8_t  byte = 0;

    while ( * string ) {
        byte = static_cast<uint8_t>( * string++ );

        if ( byte >= 'a' ) {
            byte -= 0x20;
        }

        hash ^= byte;
        hash *= 0x01000193;
    }

    return hash;
}

Same algorithm, but uses inline instead of consteval. Usage: When the string is only known at runtime:

// At runtime, hash a dynamically constructed string
char buffer[256];
sprintf(buffer, "api_%d", some_value);
auto hash = stardust::hash_string(buffer);

When to Use Each

Scenario	Use
Literal strings known at compile time	`expr::hash_string<T>("...")`
Dynamic strings constructed at runtime	`stardust::hash_string<T>(buffer)`
Module names in PEB walking	Compile-time (`expr::`)
Export table name parsing	Runtime (`stardust::`)
Comparing user input	Runtime (`stardust::`)

Export Table Resolution

The runtime hasher is used when walking export tables (src/resolve.cc:76-80):

for ( int i = 0; i < export_dir->NumberOfNames; i++ ) {
    symbol_name = reinterpret_cast<PSTR>( module_base + export_names[ i ] );

    if ( stardust::hash_string( symbol_name ) != symbol_hash ) {
        continue;
    }
    
    // Found matching hash
    address = module_base + export_addrs[ export_ordns[ i ] ];
    break;
}

Why runtime here? The export table contains actual string names. We hash them at runtime to compare against our compile-time hash. Flow:

Compile time: expr::hash_string("LoadLibraryA") → 0xE0147A9C
Runtime: Find “LoadLibraryA” in export table
Runtime: stardust::hash_string("LoadLibraryA") → 0xE0147A9C
Compare: 0xE0147A9C == 0xE0147A9C ✅ Match!
Return function address

Benefits Summary

Stealth

✅ No plaintext API names: Static analysis tools can’t trivially identify intent ✅ Signature evasion: YARA rules based on strings won’t match ✅ Obfuscated imports: Function resolution doesn’t expose plaintext names

Performance

✅ Compile-time computation: Zero runtime cost for hash calculation ✅ Small constants: 4-byte hashes instead of variable-length strings ✅ Cache friendly: Integer comparisons are fast

Code Quality

✅ Type-safe: Template-based, compiler-enforced correctness ✅ Readable: expr::hash_string("API") is clear and concise ✅ Maintainable: Change an API name in one place, hash updates automatically

Example: Adding Obfuscated API

Let’s add VirtualProtect to the shellcode:

1. Update Instance Structure

struct {
    uintptr_t handle;
    struct {
        D_API( LoadLibraryA )
        D_API( GetProcAddress )
        D_API( VirtualProtect )  // Add this
    };
} kernel32 = {
    RESOLVE_TYPE( LoadLibraryA ),
    RESOLVE_TYPE( GetProcAddress ),
    RESOLVE_TYPE( VirtualProtect )   // Add this
};

2. Automatic Resolution

The constructor automatically resolves it:

RESOLVE_IMPORT( kernel32 );  // Resolves all APIs including VirtualProtect

3. Use It

DWORD old_protect;
kernel32.VirtualProtect(
    address, 
    size, 
    PAGE_EXECUTE_READWRITE, 
    &old_protect
);

What Gets Compiled

Compiler output:

; Hash of "VirtualProtect" embedded as constant
mov eax, 0x36F8C45B    ; expr::hash_string("VirtualProtect")

No string “VirtualProtect” exists in the binary - only the hash 0x36F8C45B.

Advanced: Custom Hash Functions

You can implement alternative hash algorithms for specific use cases:

namespace expr {
    // DJB2 hash (another popular choice)
    template <typename T = char>
    consteval auto djb2_hash(
        const T* string
    ) -> uint32_t {
        uint32_t hash = 5381;
        
        while (*string) {
            uint8_t byte = static_cast<uint8_t>(*string++);
            if (byte >= 'a') byte -= 0x20;
            hash = ((hash << 5) + hash) + byte;  // hash * 33 + byte
        }
        
        return hash;
    }
}

Use this if FNV1a produces collisions in your specific set of strings.

Get Started

Core Concepts

Guides

Examples

​Why String Obfuscation?

​Compile-Time FNV1a Hashing

​The Hash Function

​Algorithm Breakdown

​consteval Keyword

​Template for String Types

​How String Hashing Avoids Detection

​Before Obfuscation

​After Obfuscation

​Static Analysis Perspective

​Practical Usage

​Module Resolution

​API Resolution

​Method 1: RESOLVE_API Macro

​Method 2: Direct Call

​Automatic Import Resolution

​Hash Collision Considerations

​Collision Probability

​Handling Collisions

​Option 1: Verify Manually

​Option 2: Use Full String Match

​Option 3: Different Hash Algorithm

​Real-World Risk

​Runtime vs Compile-Time Hashing

​Compile-Time (consteval)

​Runtime Hashing

​When to Use Each

​Export Table Resolution

​Benefits Summary

​Stealth

​Performance

​Code Quality

​Example: Adding Obfuscated API

​1. Update Instance Structure

​2. Automatic Resolution

​3. Use It

​What Gets Compiled

​Advanced: Custom Hash Functions

Build docs developers (and LLMs) love