cleanup command scans your MDX documentation files to identify which audio files are currently referenced, then removes any orphaned files from S3 storage that are no longer used. This helps keep your storage clean and reduces costs.
Usage
Arguments
Directory containing MDX files to scan for audio references. Defaults to current directory.
Options
S3 Storage Configuration
S3 bucket name where audio files are stored.Alternative: Set
S3_BUCKET environment variable.AWS region for your S3 bucket.Alternative: Set
S3_REGION environment variable.Custom S3 endpoint URL. Required for S3-compatible services like Cloudflare R2, MinIO, or DigitalOcean Spaces.Example:
--s3-endpoint "https://account-id.r2.cloudflarestorage.com"Alternative: Set S3_ENDPOINT environment variable.S3 access key ID for authentication.Alternative: Set
S3_ACCESS_KEY_ID environment variable.S3 secret access key for authentication.Alternative: Set
S3_SECRET_ACCESS_KEY environment variable.Public CDN URL used to access audio files. This must match the URL used during generation.Example:
--s3-public-url "https://cdn.example.com"Alternative: Set S3_PUBLIC_URL environment variable.Directory prefix where audio files are organized in S3. Must match the prefix used during generation.Example:
audio (files stored as audio/page-slug/voice-id.mp3)Component Configuration
Name of the audio player component to search for in MDX files. Must match the component name used during generation.Example:
--component-name "AudioPlayer"File Selection
Glob pattern for selecting MDX files to scan.Examples:
**/*.mdx- All MDX files recursivelydocs/**/*.mdx- Only files in docs directoryguides/*.mdx- Only top-level guides
Execution Options
Preview orphaned files without deleting them. Shows which files would be removed.Useful for:
- Verifying cleanup targets before deletion
- Auditing storage usage
- Testing configuration
Show detailed information about the cleanup process.
Examples
Basic Usage
Clean up orphaned files in current directory:Specify Directory
Clean up based on MDX files in a specific directory:Dry Run Preview
Preview orphaned files without deleting:Complete Configuration
Full command with all options:Using Cloudflare R2
Specific File Pattern
Clean up based on files in a specific subdirectory:How It Works
- File Discovery: Scans for MDX files matching the specified pattern
- Reference Extraction: Parses each MDX file to find audio player components
- URL Collection: Extracts audio file URLs from component props
- S3 Listing: Lists all audio files in the S3 bucket with the specified prefix
- Comparison: Identifies files in S3 that aren’t referenced in any MDX file
- Deletion: Removes orphaned files (unless
--dry-runis specified)
When to Use Cleanup
Run cleanup when:- Pages are deleted: Audio files remain in S3 after removing documentation pages
- Pages are renamed: Old audio files under the previous slug are orphaned
- Voice configuration changes: Removing voices leaves old audio files
- Content is restructured: Moving or reorganizing pages creates orphans
- Regular maintenance: Periodic cleanup to optimize storage costs
Output
Successful cleanup shows detailed progress:Safety Features
- Dry run by default is recommended: Always test with
--dry-runfirst - Explicit deletion: Files are only deleted when
--dry-runis false - Detailed preview: See exactly which files will be removed
- Pattern matching: Only processes MDX files matching your pattern
- Component-aware: Only removes files not referenced in audio components
Best Practices
1. Always Test First
2. Match Generation Config
Ensure cleanup uses the same configuration as generation:3. Regular Maintenance
Schedule periodic cleanup:4. Use Environment Variables
Set common S3 config once:Error Handling
Common errors and solutions:- S3 authentication failed: Verify access key ID and secret access key
- Bucket not found: Check bucket name and region
- No files found: Verify directory path and glob pattern
- Permission denied: Ensure S3 credentials have delete permissions
- Endpoint error: Verify S3 endpoint URL for R2/MinIO
Cost Optimization
Cleanup helps reduce costs by:- Removing unused storage: Delete orphaned audio files
- Reducing bandwidth: Fewer files to sync and back up
- Optimizing listings: Faster S3 list operations with fewer objects
- Preventing bloat: Keep storage lean as documentation evolves
