Go to file
MSVstudios 7fe0449e02 fix git ignor 03 2026-03-16 09:46:33 +01:00
tests primam di modifia chiamate bucket s3 per get object and get size 2026-03-15 20:28:54 +01:00
.env_example Version 2.1 2026-03-16 09:34:32 +01:00
.gitignore fix git ignor 03 2026-03-16 09:46:33 +01:00
Dockerfile Initial release of V02 2025-11-17 09:02:53 +01:00
README.md Version 2.1 2026-03-16 09:34:32 +01:00
build.sh Initial release of V02 2025-11-17 09:02:53 +01:00
config.py primam di modifia chiamate bucket s3 per get object and get size 2026-03-15 20:28:54 +01:00
countfiles.py feat: Enhance media import functionality with centralized MIME type management and improved validation 2026-03-15 15:04:50 +01:00
cron_launch.sh Initial release of V02 2025-11-17 09:02:53 +01:00
db_utils.py Version 2.1 2026-03-16 09:34:32 +01:00
docker-compose.yml Version 2.1 2026-03-16 09:34:32 +01:00
email_utils.py Initial release of V02 2025-11-17 09:02:53 +01:00
error_handler.py Version 2.1 2026-03-16 09:34:32 +01:00
file_utils.py Version 2.1 2026-03-16 09:34:32 +01:00
logging_config.py feat: Enhance media import functionality with centralized MIME type management and improved validation 2026-03-15 15:04:50 +01:00
main.py Version 2.1 2026-03-16 09:34:32 +01:00
requirements.txt Initial release of V02 2025-11-17 09:02:53 +01:00
s3_utils.py Version 2.1 2026-03-16 09:34:32 +01:00
utils.py primam di modifia chiamate bucket s3 per get object and get size 2026-03-15 20:28:54 +01:00
validation_utils.py feat: Enhance media import functionality with centralized MIME type management and improved validation 2026-03-15 15:04:50 +01:00

README.md

ACH Server Media Import

This repository contains a script that imports media files from an S3-compatible bucket into a PostgreSQL database. It supports both local execution (Python virtual environment) and Docker deployment via docker-compose.


Overview

Asset hierarchy

  • Conservatory Copy (Master): High-quality source (e.g., .mov, .wav). This is the primary record in the database.
  • Streaming Copy (Derivative): Transcoded versions (.mp4, .mp3) linked to the master.
  • Sidecar Metadata (.json): Contains technical metadata (mediainfo / ffprobe) used for validation and to determine the correct MIME type.
  • Sidecar QC (.pdf, .md5): Quality control and checksum files.

Important: all files belonging to the same asset must share the same 12-character inventory code (e.g., VO-UMT-14387).


Process Phases

The importer runs in three clearly separated phases (each phase is logged in detail):

Phase 1 S3 discovery + initial validation

  • List objects in the configured S3 bucket.
  • Keep only allowed extensions: .mp4, .mp3, .json, .pdf, .md5.
  • Exclude configured folders (e.g., TEST-FOLDER-DEV/, DOCUMENTAZIONE_FOTOGRAFICA/, UMT/).
  • Validate the inventory code format and ensure the folder prefix matches the type encoded in the inventory code.
  • Files failing validation are rejected before any database interaction.

Phase 2 Database cross-reference + filtering

  • Load existing filenames from the database.
  • Skip files already represented in the DB, including sidecar records.
  • Build the final list of S3 objects to parse.

Phase 3 Parse & insert

  • Read and validate sidecars (.json, .md5, .pdf) alongside the media file.
  • Use metadata (from mediainfo / ffprobe) to derive the master mime type and enforce container rules.
  • Insert new records into the database (unless ACH_DRY_RUN=true).

Validation Policy

The import pipeline enforces strict validation to prevent bad data from entering the database.

Inventory Code & Folder Prefix

  • Expected inventory code format: ^[VA][OC]-[A-Z0-9]{3}-\d{5}$.
  • The folder prefix (e.g., BRD/, DVD/, FILE/) must match the code type.
  • If the prefix does not match the inventory code, the file is rejected in Phase 1.

Safe Run (ACH_SAFE_RUN)

  • When ACH_SAFE_RUN=true, any warning during Phase 3 causes an immediate abort.
  • This prevents partial inserts when the importer detects inconsistent or already-present data.

MIME Type Determination

  • The MIME type for master files is derived from the JSON sidecar metadata (mediainfo / ffprobe), not from the streaming derivative extension.

Quick Start (Local)

Prerequisites

  • Python 3.8+
  • Virtual environment support (venv)

Setup

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Run

python main.py

Docker (docker-compose)

The project includes a docker-compose.yml with an app service (container name ACH_server_media_importer). It reads environment variables from .env and mounts a logs volume.

Build & run

docker compose up -d --build

Logs

docker compose logs -f app

Run inside the container (from the host)

If you want to execute the importer manually inside the running container (for debugging or one-off runs), you can use either of the following:

# Using docker compose (recommended)
docker compose exec app python /app/main.py

# Or using docker exec with the container name
docker exec -it ACH_server_media_importer python /app/main.py

Stop

docker compose stop

Rebuild (clean)

docker compose down --volumes --rmi local
docker compose up -d --build

Configuration

Configuration is driven by .env and config.py. Key variables include:

  • AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION, BUCKET_NAME
  • DB_HOST, DB_NAME, DB_USER, DB_PASSWORD, DB_PORT
  • ACH_DRY_RUN (true / false)
  • ACH_SAFE_RUN (true / false)

Troubleshooting

  • If Docker does not pick up changes, ensure docker compose up -d --build is run from the repo root.
  • Inspect runtime errors via docker compose logs -f app.