Skip to main content
For networks with no internet access — government, healthcare, financial, or other security-sensitive environments — BentoPDF can be deployed entirely offline.

How it works

BentoPDF’s WASM module URLs are baked into the compiled JavaScript at build time. When a user opens the app, their browser fetches those WASM files at runtime from whatever URL was configured during the build. Docker does not download WASM files during the image build. For an air-gapped deployment you need to:
  1. On a machine with internet: download the WASM packages, build a Docker image configured to point at your internal server, and bundle everything together.
  2. Transfer the bundle into the isolated network via USB drive, internal artifact repo, or other approved method.
  3. On the air-gapped side: extract the WASM files to your internal web server, load the Docker image, and run the container.
Same-origin requirement. WASM files must be served from the same origin as the BentoPDF app. Web Workers use importScripts(), which cannot load scripts cross-origin. If BentoPDF runs at https://internal.example.com, the WASM base URL must also be https://internal.example.com/wasm (not a different host).
The included prepare-airgap.sh script automates the entire preparation process: it downloads all WASM packages, builds the Docker image with your internal URLs, exports everything, and produces a self-contained bundle with a setup script.
git clone https://github.com/alam00000/bentopdf.git
cd bentopdf

# List supported OCR language codes
bash scripts/prepare-airgap.sh --list-ocr-languages

# Search OCR language codes by name
bash scripts/prepare-airgap.sh --search-ocr-language german

# Interactive mode — prompts for all options
bash scripts/prepare-airgap.sh

# Fully automated
bash scripts/prepare-airgap.sh --wasm-base-url https://internal.example.com/wasm

Script options

FlagDescriptionDefault
--wasm-base-url <url>Where WASM files will be hosted internally. This sets the base for all WASM and OCR URLs.(required; prompted if missing)
--image-name <name>Docker image tagbentopdf
--output-dir <path>Output bundle directory./bentopdf-airgap-bundle
--simple-modeEnable Simple Modeoff
--base-url <path>Subdirectory base URL (e.g. /pdf/)/
--language <code>Default UI language (e.g. fr, de)(none)
--brand-name <name>Custom brand name(none)
--brand-logo <path>Logo path relative to public/(none)
--footer-text <text>Custom footer text(none)
--ocr-languages <list>Comma-separated OCR language codes to bundle (e.g. eng,deu,fra)eng
--list-ocr-languagesPrint all supported OCR codes and names, then exitoff
--search-ocr-language <term>Search OCR codes by name or abbreviation (e.g. search german)off
--dockerfile <path>Dockerfile to useDockerfile
--skip-dockerSkip the Docker build and export stepoff
--skip-wasmSkip WASM download (reuse existing .tgz files)off
At the interactive prompt, type list to print all Tesseract language codes or search <term> to find a match — for example search chi for Chinese variants.

Output bundle structure

bentopdf-airgap-bundle/
  bentopdf.tar              # Docker image (docker save output)
  *.tgz                     # WASM packages: PyMuPDF, Ghostscript, CoherentPDF, Tesseract
  tesseract-langdata/       # OCR traineddata files (.traineddata.gz)
  ocr-fonts/                # OCR text-layer font files (NotoSans)
  setup.sh                  # Setup script to run on the air-gapped side
  README.md                 # Step-by-step instructions

Transfer and set up

1

Transfer the bundle

Copy the entire bentopdf-airgap-bundle/ directory into the air-gapped network via USB drive, internal artifact repository, or another approved transfer method.
2

Run the setup script

On the air-gapped machine:
cd bentopdf-airgap-bundle
bash setup.sh
The setup script:
  • Loads the Docker image from bentopdf.tar
  • Extracts all WASM packages to the directory your web server will serve
  • Optionally starts the container
3

Verify

Open the app URL in a browser and run a PDF operation that requires a WASM module (e.g., PDF/A conversion). If the WASM files are not accessible, the browser console will show a failed fetch for the configured URL.

Manual steps

npm pack @bentopdf/[email protected]
npm pack @bentopdf/gs-wasm
npm pack coherentpdf
npm pack [email protected]
npm pack [email protected]

mkdir -p tesseract-langdata
curl -fsSL https://cdn.jsdelivr.net/npm/@tesseract.js-data/eng/4.0.0_best_int/eng.traineddata.gz \
  -o tesseract-langdata/eng.traineddata.gz

mkdir -p ocr-fonts
curl -fsSL https://raw.githack.com/googlefonts/noto-fonts/main/hinted/ttf/NotoSans/NotoSans-Regular.ttf \
  -o ocr-fonts/NotoSans-Regular.ttf
git clone https://github.com/alam00000/bentopdf.git
cd bentopdf

docker build \
  --build-arg VITE_WASM_PYMUPDF_URL=https://internal.example.com/wasm/pymupdf/ \
  --build-arg VITE_WASM_GS_URL=https://internal.example.com/wasm/gs/ \
  --build-arg VITE_WASM_CPDF_URL=https://internal.example.com/wasm/cpdf/ \
  --build-arg VITE_TESSERACT_WORKER_URL=https://internal.example.com/wasm/ocr/worker.min.js \
  --build-arg VITE_TESSERACT_CORE_URL=https://internal.example.com/wasm/ocr/core \
  --build-arg VITE_TESSERACT_LANG_URL=https://internal.example.com/wasm/ocr/lang-data \
  --build-arg VITE_TESSERACT_AVAILABLE_LANGUAGES=eng \
  --build-arg VITE_OCR_FONT_BASE_URL=https://internal.example.com/wasm/ocr/fonts \
  -t bentopdf .
docker save bentopdf -o bentopdf.tar
Transfer these files via USB, internal artifact repo, or approved method:
  • bentopdf.tar — Docker image
  • bentopdf-pymupdf-wasm-*.tgz — PyMuPDF WASM package
  • bentopdf-gs-wasm-*.tgz — Ghostscript WASM package
  • coherentpdf-*.tgz — CoherentPDF WASM package
  • tesseract.js-7.0.0.tgz — Tesseract worker package
  • tesseract.js-core-7.0.0.tgz — Tesseract core runtime package
  • tesseract-langdata/ — OCR traineddata files
  • ocr-fonts/ — OCR text-layer font files
# Load the Docker image
docker load -i bentopdf.tar

# Extract WASM packages to your internal web server's document root
mkdir -p /var/www/wasm/pymupdf /var/www/wasm/gs /var/www/wasm/cpdf \
         /var/www/wasm/ocr/core /var/www/wasm/ocr/lang-data /var/www/wasm/ocr/fonts

tar xzf bentopdf-pymupdf-wasm-*.tgz    -C /var/www/wasm/pymupdf --strip-components=1
tar xzf bentopdf-gs-wasm-*.tgz         -C /var/www/wasm/gs      --strip-components=1
tar xzf coherentpdf-*.tgz              -C /var/www/wasm/cpdf    --strip-components=1

TEMP_TESS=$(mktemp -d)
tar xzf tesseract.js-7.0.0.tgz -C "$TEMP_TESS"
cp "$TEMP_TESS/package/dist/worker.min.js" /var/www/wasm/ocr/worker.min.js
rm -rf "$TEMP_TESS"

tar xzf tesseract.js-core-7.0.0.tgz -C /var/www/wasm/ocr/core --strip-components=1

cp ./tesseract-langdata/*.traineddata.gz /var/www/wasm/ocr/lang-data/
cp ./ocr-fonts/*                         /var/www/wasm/ocr/fonts/

# Run BentoPDF
docker run -d -p 3000:8080 --restart unless-stopped bentopdf
Make sure the extracted files are accessible at the URLs you configured in Step 2.

Building from source instead of Docker

If you prefer to build a static dist/ folder and serve it without Docker, set the WASM variables in .env.production before building:
# .env.production
VITE_WASM_PYMUPDF_URL=https://internal.example.com/wasm/pymupdf/
VITE_WASM_GS_URL=https://internal.example.com/wasm/gs/
VITE_WASM_CPDF_URL=https://internal.example.com/wasm/cpdf/
VITE_TESSERACT_WORKER_URL=https://internal.example.com/wasm/ocr/worker.min.js
VITE_TESSERACT_CORE_URL=https://internal.example.com/wasm/ocr/core
VITE_TESSERACT_LANG_URL=https://internal.example.com/wasm/ocr/lang-data
VITE_OCR_FONT_BASE_URL=https://internal.example.com/wasm/ocr/fonts
Then run:
npm install
npm run build
# Serve dist/ with your internal web server

OCR fonts note

For fully offline searchable PDF output (OCR with a text layer), BentoPDF needs to load NotoSans font files to embed into the PDF. Without VITE_OCR_FONT_BASE_URL, the app will try to fetch them from public Google Fonts CDN URLs. Set VITE_OCR_FONT_BASE_URL to the internal directory serving the bundled ocr-fonts/ contents:
--build-arg VITE_OCR_FONT_BASE_URL=https://internal.example.com/wasm/ocr/fonts
The ocr-fonts/ directory is included in the automated script output bundle and contains the required NotoSans-Regular.ttf file.

Build docs developers (and LLMs) love