Deployment

Document Processing is delivered as a Docker image. This page covers the prerequisites, an evaluation quick-start, production deployment patterns, and reference benchmarks.

System requirements

Hardware

NVIDIA GPU with a CUDA 12.6-compatible driver. The service has been benchmarked on an RTX 4080 Super (16 GB VRAM); smaller GPUs will work but require lower WORKERS counts. See Sizing and benchmarks for the reference configuration.
Shared memory of at least 2 GB (--shm-size=2g).
CPU and RAM scale with the number of workers. The reference benchmark used a 12-core CPU and 32 GB of RAM; adjust for your archive size and target throughput.

Software

Docker with the NVIDIA Container Toolkit installed and configured.
A PostgreSQL instance (if using the db target), or an S3 bucket (if using the s3 source/target).

Network

Document Processing does not require outbound internet access at runtime. All models are bundled into the image; processing happens entirely inside the customer's network. Outbound access to S3 is only needed if S3 is configured as the source or target.

Quick start (evaluation)

The simplest way to validate the deployment is to run the image you received against a local folder of sample documents and write results to a local CSV file. No database or object storage is required for this first run.

Confirm the system requirements are met and the Docker host can see the GPU:

docker run --rm --gpus all \
  nvidia/cuda:12.6.0-base-ubuntu22.04 nvidia-smi

Pull the image you received from Mobai from the private registry you were granted access to:
```
docker pull <image>
```
Place a small set of sample documents (any of .pdf, .tif, .tiff, .png, .jpg) in a local directory — for example ./input.

Run the container against that directory:

docker run --rm \
  --gpus all \
  --shm-size=2g \
  -e DATA_SOURCE=local \
  -e LOCAL_DIRECTORY_TO_PROCESS=/app/input \
  -e DATA_TARGET=local \
  -e LOCAL_DIRECTORY_TO_SAVE=/app/output \
  -e WORKERS=1 \
  -e LOG_LEVEL=INFO \
  -v "$(pwd)/input:/app/input" \
  -v "$(pwd)/output:/app/output" \
  <image>

When processing completes, inspect ./output/records.csv for the extracted records. Structured JSON logs are written to stdout while the container runs.

Replace <image> with the reference for the image you received from Mobai. See Configuration for the full list of environment variables; the production setup below covers database-backed deployments with S3 sources.

Production deployment

For production, run the container against your own PostgreSQL and (optionally) S3.

Environment file

Create a production.env file with the configuration appropriate to your environment. A minimal S3-source / DB-target setup looks like:

# Source
DATA_SOURCE=s3
ACCESS_KEY_ID=...
SECRET_ACCESS_KEY=...
REGION_NAME=eu-north-1
BUCKET_NAME=<your-archive-bucket>
S3_DIRECTORY_TO_PROCESS=<archive-prefix>

# Target
DATA_TARGET=db
DB_HOST=<your-pg-host>
DB_PORT=5432
DB_NAME=<your-db-name>
DB_USER=<your-db-user>
DB_PASSWORD=<your-db-password>
DB_REGION=eu-north-1

# Behaviour
SKIP_PROCESSED_FILES=true
WORKERS=4
LOG_LEVEL=INFO

See Configuration for the full reference, including IAM-based RDS authentication for db targets.

Run command

docker run -d \
  --name document-processing \
  --env-file production.env \
  --gpus all \
  --shm-size=2g \
  --cpus="8" \
  --memory="32g" \
  --restart=unless-stopped \
  <image>

Replace <image> with the reference for the image you received from Mobai.

The --gpus all flag is required for GPU acceleration. The container exits cleanly when the work queue is drained — use a job scheduler (cron, Kubernetes Job, Nomad batch) or restart: on-failure semantics if you want it to re-run on a cadence.

Sizing and benchmarks

Throughput is dominated by per-document model inference. Throughput typically increases with additional WORKERS until the GPU's compute or memory capacity becomes the bottleneck.

Reference benchmark

Hardware:

CPU: AMD Ryzen 9 7900 (12 cores)
GPU: NVIDIA RTX 4080 Super (16 GB)
RAM: 32 GB

Workload: 500 multi-page TIFF scans + 500 Keesing-style PDF reports.

Workers	Wall-clock	Peak RAM	Peak GPU
4	~83 minutes (4980 s)	12 GB	60%

For larger archives, increase WORKERS to use more of the GPU's capacity. Monitor utilisation with nvidia-smi and reduce the count if you encounter GPU out-of-memory errors.

Tuning guidance

Monitor nvidia-smi while increasing WORKERS to find a sustainable level for your hardware.
CSV targets are single-worker only. CSV append is not concurrency-safe; use db or s3 for parallel runs.
Cold-start cost. Each worker loads the full model set at startup. This overhead is non-trivial for short batch runs but amortised across archive-scale runs.

Verifying the deployment

After bringing up the service, you should see in the logs:

[INFO] Processing document
[INFO] Processed <filename>.pdf in 4.2s

And, if you are using the database target, new rows appearing in the configured output tables. If you see records being inserted, the pipeline is healthy. See Operations for monitoring and troubleshooting guidance.

System requirements​

Hardware​

Software​

Network​

Quick start (evaluation)​

Production deployment​

Environment file​

Run command​

Sizing and benchmarks​

Reference benchmark​

Tuning guidance​

Verifying the deployment​