Skip to main content

Operations

This page covers what to expect at runtime: log formats, monitoring queries, restart behaviour, and how to diagnose common issues.

Logging

Document Processing emits structured JSON logs to standard output. Each line is a single JSON object, which makes the output trivially ingestible by Datadog, Splunk, ELK, CloudWatch Logs, or any log-aggregation tool.

Log format

{
"timestamp": "2025-04-04 14:13:52,613",
"app_name": "reference-face-image-extractor",
"app_version": "1.0.0",
"log_type": "application",
"log_level": "INFO",
"payload": {
"message": "Processed file.pdf in 2.34s"
}
}

Common payload fields

FieldDescription
messageHuman-readable description of what happened. Always present.

Lifecycle log lines

A successful document run produces one INFO line when the file finishes:

{ ..., "log_level": "INFO", "payload": { "message": "Processed file.pdf in 2.34s" } }

A failure produces an ERROR line with the exception message and a full Python traceback in a top-level exception field:

{
...,
"log_level": "ERROR",
"payload": { "message": "Error processing file /data/archive/file.pdf: <exception message>" },
"exception": "<full traceback>"
}

Monitoring

Health checks

# Container is running and consuming GPU / memory
docker stats document-processing

# Recent log lines
docker logs --tail=200 document-processing

# Pull only processing-completion lines
docker logs document-processing 2>&1 | grep '"Processed"' | tail -20

Database schema

When DATA_TARGET=db, the service creates a mobai schema on first run and writes to three tables.

mobai.records

One row per successfully processed document — the primary output table.

ColumnTypeDescription
idSERIAL PRIMARY KEYAuto-incrementing row ID.
file_nameTEXTSource file name.
id_numberTEXTNational identity number. Indexed.
first_nameTEXTGiven name(s).
last_nameTEXTSurname.
birth_dateTEXTYYYY-MM-DD.
genderTEXTM / F / X.
nationalityTEXTISO 3166-1 alpha-3.
document_typeTEXTNormalised document type.
document_numberTEXTDocument number as printed.
issuing_countryTEXTISO 3166-1 alpha-3.
issue_dateTEXTYYYY-MM-DD.
expiry_dateTEXTYYYY-MM-DD.
face_imageTEXTBase64-encoded PNG.
face_image_quality_scoreFLOATFace image quality score (higher is better).
created_atTIMESTAMPTZRow creation time. Defaults to CURRENT_TIMESTAMP.

mobai.best_face_images

One row per person (id_number), holding the single best-ranked reference face image. This is the table to query when building a biometric reference database.

ColumnTypeDescription
idSERIAL PRIMARY KEYAuto-incrementing row ID.
file_nameTEXTSource file the best image came from.
id_numberTEXTNational identity number. Indexed.
document_typeTEXTDocument type the image came from.
issuing_countryTEXTISO 3166-1 alpha-3.
sourceTEXTSource label for the extraction path.
image_takenTEXTDate the image was taken / document issued.
scoreFLOATFace image quality score. Indexed.
contentTEXTBase64-encoded PNG.
created_atTIMESTAMPTZRow creation time.

mobai.face_images

One row per detected face across the archive — useful for auditing which submissions a person's reference image was sourced from.

ColumnTypeDescription
idSERIAL PRIMARY KEYAuto-incrementing row ID.
file_nameTEXTSource file name.
id_numberTEXTNational identity number. Indexed.
face_widthFLOATWidth in pixels.
face_heightFLOATHeight in pixels.
ofiq_scoreFLOATFace image quality score. Indexed.
face_imageTEXTBase64-encoded PNG.
created_atTIMESTAMPTZRow creation time.

Database monitoring

When using the database target, common things worth tracking include:

-- Overall throughput: records added today
SELECT COUNT(*) AS records_today
FROM mobai.records
WHERE created_at >= CURRENT_DATE;

-- Distribution of extracted document types
SELECT document_type, COUNT(*) AS n
FROM mobai.records
GROUP BY document_type
ORDER BY n DESC;

-- Proportion of records that include a face image
SELECT
COUNT(*) FILTER (WHERE face_image IS NOT NULL) AS with_face,
COUNT(*) FILTER (WHERE face_image IS NULL) AS without_face
FROM mobai.records;

-- Coverage: distinct persons with a selected best reference image
SELECT COUNT(DISTINCT id_number) AS persons_covered
FROM mobai.best_face_images;

Resumable runs

Long batch runs can be safely interrupted. When the service is restarted with SKIP_PROCESSED_FILES=true, it:

  1. Queries the configured target (DB / S3 CSV / local CSV) for the list of file_name values already present.
  2. Subtracts those from the candidate file list.
  3. Processes only the remainder.

This makes it safe to:

  • Stop and restart the container at any time — even mid-document — without re-processing completed work.
  • Add new documents to the archive and re-run; only the new files will be processed.
  • Switch worker counts mid-run by stopping, adjusting WORKERS, and restarting.

To force a full re-process, set SKIP_PROCESSED_FILES=false. Note that this does not clear the existing target — duplicate file_name entries will be inserted. If you want a clean reprocess, truncate the target tables (or rotate the CSV) first.

Troubleshooting

The container exits immediately

Typically caused by misconfigured environment variables. Run with LOG_LEVEL=DEBUG and look for ValueError lines naming the missing variable. Common culprits:

  • DATA_SOURCE or DATA_TARGET not set, or set to an unsupported value.
  • S3_DIRECTORY_TO_PROCESS missing when DATA_SOURCE=s3.
  • DB_HOST / DB_USER missing when DATA_TARGET=db.

No documents are processed

Check that the source actually contains files with supported extensions (.pdf, .tif, .tiff). The pipeline only processes these extensions and silently skips others.

If SKIP_PROCESSED_FILES=true and the target already contains entries for all files in the source, the work list will be empty and the service will exit cleanly.

GPU not detected

# Verify the host sees the GPU
nvidia-smi

# Verify Docker can pass the GPU through
docker run --rm --gpus all nvidia/cuda:12.6.0-base-ubuntu22.04 nvidia-smi

If nvidia-smi works on the host but not inside the test container, the NVIDIA Container Toolkit is not installed correctly. See the Toolkit installation guide.

Out-of-memory errors on the GPU

Lower WORKERS. Each worker loads its own copy of the models onto the GPU, so the combined memory footprint must fit within the GPU's available VRAM.

psycopg2.OperationalError on startup

Database connection failure. Check:

  • DB_HOST is reachable from inside the container (test with docker exec ... ping <DB_HOST>).
  • Credentials are correct.
  • If using IAM auth, the container's IAM role has rds-db:connect and the database user is an IAM-enabled role.
  • sslmode=require is enforced for IAM-authenticated connections — your RDS instance must accept SSL.

Pre-signed S3 URLs expiring mid-download

The pipeline automatically renews pre-signed URLs on download failure (up to 2 retries per file). If you see persistent 403 Forbidden errors in logs, verify that:

  • The credentials have not been rotated since the run started.
  • The S3 bucket policy allows access from the container's network.