Overview
The SPU atomic operations API provides lock-free synchronization primitives for safe concurrent access to shared memory locations in main memory. These operations use the Cell’s reservation-based atomic mechanism (Load Locked / Store Conditional).
Key Features
- Lock-free: No OS-level locks required
- 128-byte granularity: Operations work on cache-line sized blocks
- Automatic retry: Built-in retry logic on contention
- Multiple SPU safe: Coordinate between multiple SPUs
- PPU compatible: Can synchronize with PPU threads
Atomic Integer Operations
All atomic operations take a local buffer (128-byte aligned) and an effective address (also 128-byte aligned). They return the previous value from memory.
spu_atomic_incr32
Atomically increment a 32-bit value in main memory.
uint32_t spu_atomic_incr32(uint32_t *ls, uint64_t ea)
Local store buffer (128-byte aligned)
Effective address in main memory (128-byte aligned)
The value before incrementing
spu_atomic_incr64
Atomically increment a 64-bit value.
uint64_t spu_atomic_incr64(uint64_t *ls, uint64_t ea)
spu_atomic_decr32
Atomically decrement a 32-bit value.
uint32_t spu_atomic_decr32(uint32_t *ls, uint64_t ea)
spu_atomic_decr64
Atomically decrement a 64-bit value.
uint64_t spu_atomic_decr64(uint64_t *ls, uint64_t ea)
spu_atomic_test_and_decr32
Atomically decrement if value is greater than zero.
uint32_t spu_atomic_test_and_decr32(uint32_t *ls, uint64_t ea)
Previous value (decrements only if it was > 0)
spu_atomic_test_and_decr64
Atomically test and decrement a 64-bit value.
uint64_t spu_atomic_test_and_decr64(uint64_t *ls, uint64_t ea)
Atomic Arithmetic
spu_atomic_add32
Atomically add a value to a 32-bit integer.
uint32_t spu_atomic_add32(uint32_t *ls, uint64_t ea, uint32_t value)
Previous value before addition
spu_atomic_add64
Atomically add to a 64-bit integer.
uint64_t spu_atomic_add64(uint64_t *ls, uint64_t ea, uint64_t value)
spu_atomic_sub32
Atomically subtract from a 32-bit integer.
uint32_t spu_atomic_sub32(uint32_t *ls, uint64_t ea, uint32_t value)
spu_atomic_sub64
Atomically subtract from a 64-bit integer.
uint64_t spu_atomic_sub64(uint64_t *ls, uint64_t ea, uint64_t value)
Atomic Bitwise Operations
spu_atomic_or32
Atomically OR a value with a 32-bit integer.
uint32_t spu_atomic_or32(uint32_t *ls, uint64_t ea, uint32_t value)
Previous value before OR operation
spu_atomic_or64
Atomically OR with a 64-bit integer.
uint64_t spu_atomic_or64(uint64_t *ls, uint64_t ea, uint64_t value)
spu_atomic_and32
Atomically AND a value with a 32-bit integer.
uint32_t spu_atomic_and32(uint32_t *ls, uint64_t ea, uint32_t value)
spu_atomic_and64
Atomically AND with a 64-bit integer.
uint64_t spu_atomic_and64(uint64_t *ls, uint64_t ea, uint64_t value)
Atomic Store and Swap
spu_atomic_store32
Atomically store a new value.
uint32_t spu_atomic_store32(uint32_t *ls, uint64_t ea, uint32_t value)
Previous value (effectively an atomic exchange)
spu_atomic_store64
Atomically store a 64-bit value.
uint64_t spu_atomic_store64(uint64_t *ls, uint64_t ea, uint64_t value)
Compare and Swap
spu_atomic_compare_and_swap32
Atomically compare and swap a 32-bit value.
uint32_t spu_atomic_compare_and_swap32(uint32_t *ls, uint64_t ea, uint32_t compare, uint32_t value)
New value to store if comparison succeeds
Actual value in memory (check if equal to compare to determine success)
spu_atomic_compare_and_swap64
Atomically compare and swap a 64-bit value.
uint64_t spu_atomic_compare_and_swap64(uint64_t *ls, uint64_t ea, uint64_t compare, uint64_t value)
Manual Atomic Operations
spu_atomic_lock_line32
Load a cache line with reservation for manual atomic operation.
uint32_t spu_atomic_lock_line32(uint32_t *ls, uint64_t ea)
Local store buffer (128 bytes, 128-byte aligned)
Effective address (will be aligned to 128-byte boundary)
Current value at the specified address
spu_atomic_lock_line64
Load a cache line for 64-bit atomic operation.
uint64_t spu_atomic_lock_line64(uint64_t *ls, uint64_t ea)
spu_atomic_store_conditional32
Attempt to store with reservation check.
int spu_atomic_store_conditional32(uint32_t *ls, uint64_t ea, uint32_t value)
Local store buffer (same as used in lock_line)
Value to store at the aligned offset
Nonzero if store succeeded, zero if reservation was lost
spu_atomic_store_conditional64
Conditional store for 64-bit values.
int spu_atomic_store_conditional64(uint64_t *ls, uint64_t ea, uint64_t value)
No-Op Operation
spu_atomic_nop32
Atomic no-op (reads and writes back unchanged).
uint32_t spu_atomic_nop32(uint32_t *ls, uint64_t ea)
Useful for testing atomic mechanism or forcing a cache line load.
spu_atomic_nop64
uint64_t spu_atomic_nop64(uint64_t *ls, uint64_t ea)
Example Usage
Simple Atomic Counter
#include <sys/spu_atomic.h>
// Shared counter in main memory (128-byte aligned)
uint64_t shared_counter_addr = 0x20000000;
// Local buffer (must be 128-byte aligned)
uint32_t local_buf[32] __attribute__((aligned(128)));
// Atomically increment counter
uint32_t old_value = spu_atomic_incr32(local_buf, shared_counter_addr);
printf("Counter was %u, now %u\n", old_value, old_value + 1);
Atomic Flags (Bitwise Operations)
// Set multiple status bits atomically
uint32_t flags_addr = 0x20000080; // Must be 128-byte aligned
uint32_t local_buf[32] __attribute__((aligned(128)));
#define FLAG_READY 0x01
#define FLAG_BUSY 0x02
#define FLAG_COMPLETE 0x04
// Set BUSY flag
spu_atomic_or32(local_buf, flags_addr, FLAG_BUSY);
// Clear BUSY, set COMPLETE
spu_atomic_and32(local_buf, flags_addr, ~FLAG_BUSY);
spu_atomic_or32(local_buf, flags_addr, FLAG_COMPLETE);
Spinlock Implementation
typedef struct {
uint32_t locked;
uint32_t padding[31]; // Pad to 128 bytes
} __attribute__((aligned(128))) spinlock_t;
uint64_t lock_addr = 0x20000000;
uint32_t local_buf[32] __attribute__((aligned(128)));
void spinlock_acquire() {
uint32_t old;
do {
// Try to swap 0 (unlocked) with 1 (locked)
old = spu_atomic_compare_and_swap32(local_buf, lock_addr, 0, 1);
} while (old != 0); // Retry if lock was already held
}
void spinlock_release() {
// Store 0 (unlocked)
spu_atomic_store32(local_buf, lock_addr, 0);
}
Semaphore (Test and Decrement)
// Semaphore in main memory
uint64_t sem_addr = 0x20000000;
uint32_t local_buf[32] __attribute__((aligned(128)));
void semaphore_wait() {
uint32_t old;
do {
// Decrement only if > 0
old = spu_atomic_test_and_decr32(local_buf, sem_addr);
if (old == 0) {
// Semaphore is zero, wait a bit
for (volatile int i = 0; i < 1000; i++);
}
} while (old == 0);
}
void semaphore_post() {
spu_atomic_incr32(local_buf, sem_addr);
}
Manual Atomic Operation
// Custom atomic operation: multiply by 2
uint64_t value_addr = 0x20000000;
uint32_t local_buf[32] __attribute__((aligned(128)));
int success;
do {
// Load with reservation
uint32_t current = spu_atomic_lock_line32(local_buf, value_addr);
// Compute new value
uint32_t new_value = current * 2;
// Try to store
success = spu_atomic_store_conditional32(local_buf, value_addr, new_value);
} while (!success); // Retry if another SPU modified the value
Lock-Free Stack (ABA-safe)
typedef struct node {
uint64_t next; // Address of next node
uint32_t data;
uint32_t aba_counter; // Prevent ABA problem
} node_t;
typedef struct {
uint64_t head; // Address of head node
uint32_t aba_counter;
uint32_t padding[29];
} __attribute__((aligned(128))) stack_t;
uint64_t stack_addr = 0x20000000;
uint64_t local_buf[16] __attribute__((aligned(128)));
void push(uint64_t node_addr) {
node_t node;
uint64_t old_head;
do {
// Read current stack state
old_head = spu_atomic_lock_line64(local_buf, stack_addr);
// Update node to point to current head
node.next = old_head;
// Try to make this node the new head
} while (!spu_atomic_store_conditional64(local_buf, stack_addr, node_addr));
}
- Alignment: All atomic operations require 128-byte alignment
- Contention: High contention causes retry loops - consider alternatives
- Cache effects: Each atomic op loads 128 bytes even for small values
- False sharing: Separate frequently-updated atomics by 128 bytes
- Alternative patterns: Sometimes message passing is more efficient than atomics
Alignment Requirements
// Correct: 128-byte aligned
uint32_t atomic_value[32] __attribute__((aligned(128)));
uint64_t addr = 0x20000000; // Must be 128-byte aligned
// The actual value can be anywhere within the 128-byte block
// The API handles the offset automatically
Memory Ordering
All atomic operations include:
spu_dsync() before store conditional (ensures local changes are visible)
- Implicit memory barriers from the atomic mechanism itself
This ensures proper ordering for most use cases. For complex memory ordering requirements, additional synchronization may be needed.