Audio engine: sequences, sounds, and banks in SM64
How SM64 drives music, sound effects, and ambient audio through the N64 RSP, covering sequence players, sound banks, the public API, and version differences.
Use this file to discover all available pages before exploring further.
SM64’s audio engine is a custom RSP-accelerated system that runs entirely on the Nintendo 64’s Reality Signal Processor. Game code on the CPU submits high-level requests — play a sequence, trigger a sound effect, fade out the music — and the audio subsystem translates those into RSP microcode tasks that synthesize PCM samples directly into the Audio Interface (AI) DMA buffers each frame.
Three global SequencePlayer slots exist in JP/US (four in EU/Shindou):
// from src/audio/load.h// gSequencePlayers[0] is level background music// gSequencePlayers[1] is misc music, like the puzzle jingle// gSequencePlayers[2] is soundextern struct SequencePlayer gSequencePlayers[SEQUENCE_PLAYERS];
Player indices are exposed as constants in external.h:
#define SEQ_PLAYER_LEVEL 0 // Level background music#define SEQ_PLAYER_ENV 1 // Misc music like the puzzle jingle#define SEQ_PLAYER_SFX 2 // Sound effects
audio_signal_game_loop_tick() is called once per game frame to advance internal counters.
2
Audio frame task creation
create_next_audio_frame_task() runs the sequence players, processes queued sound effects, invokes the synthesis pipeline, and builds an RSP task (SPTask).
3
RSP execution
The task is dispatched to the RSP, which runs the audio microcode found in sound/rsp/. Sample data is DMA’d from ROM on demand by dma_sample_data().
4
AI DMA
Synthesized PCM is written to one of three rotating AI buffers (gAiBuffers[NUMAIBUFFERS]) and handed to the Audio Interface hardware.
“Audio bank” refers to the instrument/sample ROM blobs loaded by the CTL/TBL system — distinct from “sound banks” (the categorical groupings in sounds.h).
Music sequences are binary scripts interpreted by the sequence player each audio frame. The format is MIDI-inspired: commands encode note events, tempo changes, channel routing, and volume/transpose control. The script state for each active player or channel is tracked in M64ScriptState:
// from src/audio/internal.hstruct M64ScriptState { u8 *pc; // program counter into sequence data u8 *stack[4]; // call stack (supports 4 levels of subroutine) u8 remLoopIters[4]; u8 depth;}; // size = 0x1C
Sequences are preloaded with preload_sequence() and activated via load_sequence().
A sequence ID may be OR’d with SEQ_VARIATION (0x80) to load the same sequence script but set a variation bit that the script can read to alter its behaviour at runtime.
Sequences with the SEQ_LEVEL_ prefix are level background tracks. SEQ_EVENT_ sequences are one-shot jingles. SEQ_MENU_ sequences play during menu screens.SEQ_MENU_GAME_OVER is a special alias defined as SEQ_MENU_TITLE_SCREEN | SEQ_VARIATION.
The upper nibble of the word selects the sound bank — a categorical grouping that controls which channel the sound competes on and how it behaves when muted:
Two groups of bitflags further control behaviour. Upper bitflags (bits 24-27):
#define SOUND_NO_VOLUME_LOSS 0x1000000 // No volume loss with distance#define SOUND_VIBRATO 0x2000000 // Randomly alter frequency each audio frame#define SOUND_NO_PRIORITY_LOSS 0x4000000 // Do not prioritize closer sounds#define SOUND_CONSTANT_FREQUENCY 0x8000000 // Frequency unaffected by distance/speed
Lower bitflags (bits 4-7):
#define SOUND_LOWER_BACKGROUND_MUSIC 0x10 // Lower BGM volume while playing#define SOUND_NO_ECHO 0x20 // Disable level reverb (not in JP)#define SOUND_DISCRETE 0x80 // Restart sound on every play_sound call
Sound constants follow a SOUND_<BANK>_<DESCRIPTION> pattern. Terrain-dependent sounds use offset IDs; the actual terrain type (0–7) is added to the base sound ID at runtime:
// Play a sequence on a player slot. seqArgs is built with SEQUENCE_ARGS().void play_music(u8 player, u16 seqArgs, u16 fadeTimer);// Stop the background music sequence identified by seqId.void stop_background_music(u16 seqId);// Fade out background music over fadeOut frames, then silence.void fadeout_background_music(u16 arg0, u16 fadeOut);// Discard the next item from the background music queue.void drop_queued_background_music(void);// Return the seqId currently playing on SEQ_PLAYER_LEVEL.u16 get_current_background_music(void);// Start a secondary music layer (e.g. for Koopa race).void play_secondary_music(u8 seqId, u8 bgMusicVolume, u8 volume, u16 fadeTimer);
SEQUENCE_ARGS packs priority and sequence ID into the seqArgs word:
// Trigger a sound effect. soundBits is a packed SOUND_ARG_LOAD word;// pos is a pointer to a 3-component f32 world-space position.void play_sound(s32 soundBits, f32 *pos);// Stop a specific sound at the given source position.void stop_sound(u32 soundBits, f32 *pos);// Stop all sounds emitting from a position.void stop_sounds_from_source(f32 *pos);// Stop all sounds in banks flagged as continuous.void stop_sounds_in_continuous_banks(void);// Enable or disable entire sound banks for a player.void sound_banks_disable(u8 player, u16 bankMask);void sound_banks_enable(u8 player, u16 bankMask);// Set the movement speed used for SOUND_BANK_MOVING frequency scaling.void set_sound_moving_speed(u8 bank, u8 speed);
Game-side code uses helper functions from src/game/spawn_sound.h rather than calling play_sound() directly.
// from src/game/spawn_sound.hstruct SoundState { s16 playSound; // 1 = use this entry, 0 = skip s8 animFrame1; // animation frame at which to trigger (left foot) s8 animFrame2; // animation frame at which to trigger (right foot) s32 soundMagic; // SOUND_ARG_LOAD packed value};// Play soundMagic from the current object's position (one-shot style).void cur_obj_play_sound_1(s32 soundMagic);// Play soundMagic from the current object's position (continuous style).void cur_obj_play_sound_2(s32 soundMagic);// Spawn a dedicated sound-spawner object at the current object's position.void create_sound_spawner(s32 soundMagic);// Walk a SoundState table and fire sounds on matching animation frames.void exec_anim_sound_state(struct SoundState *soundStates);
exec_anim_sound_state is the standard way to synchronise footstep and impact sounds to animation frames. Pass it a NULL-terminated array of SoundState entries, one per distinct frame event.
// from external.h#define SOUND_MODE_STEREO 0#define SOUND_MODE_MONO 3#define SOUND_MODE_HEADSET 1
The headset mode activates per-channel headset pan effects tracked in stereoHeadsetEffects flags on both SequenceChannel and Note. Headset pan volumes are stored in gHeadsetPanVolume[128] and quantised through gHeadsetPanQuantization.
The actual synthesis work is done by N64 RSP microcode stored in sound/rsp/. The CPU builds an RSP command list — a sequence of 64-bit words — each audio frame through the synthesis pipeline (synthesis_execute()), then hands the task to the RSP via the OS task system. The entry point on the CPU side is:
JP uses double-precision floating point in several places (the US_FLOAT macro wraps literals that become f suffixed on US). JP also has a different dialog voice for DIALOG_037. Tempo in JP is expressed in beats-per-minute; US and later versions use tatums-per-minute (TEMPO_SCALE = TATUMS_PER_BEAT = 48).
EU
EU adds a fourth sequence player slot (SEQUENCE_PLAYERS = 4), increases channel and layer pool sizes, introduces gAudioBufferParameters for PAL/NTSC timing, and adds up to four independent reverb units (gSynthesisReverbs[4]). The EuAudioCmd struct routes audio parameter changes through a command queue rather than direct writes.
Shindou (SH)
The Shindou revision adds src/audio/synthesis_sh.c and src/audio/load_sh.c as replacements for the standard synthesis and load modules. It introduces an async DMA pipeline for sample data (PendingDmaSample, UnkStruct80343D00), additional per-note synthesisVolume and filter fields, and expanded reverb pan controls (panRight/panLeft in SynthesisReverb). Shindou also uses a larger AudioSessionSettingsEU struct and an extended AI buffer length (AIBUFFER_LEN = 0xb00).