# ACH Server Media Import This repository contains a script that imports media files from an S3-compatible bucket into a PostgreSQL database. It supports both local execution (Python virtual environment) and Docker deployment via `docker-compose`. --- ## Overview ### Asset hierarchy - **Conservatory Copy (Master)**: High-quality source (e.g., `.mov`, `.wav`). This is the primary record in the database. - **Streaming Copy (Derivative)**: Transcoded versions (`.mp4`, `.mp3`) linked to the master. - **Sidecar Metadata (`.json`)**: Contains technical metadata (`mediainfo` / `ffprobe`) used for validation and to determine the correct MIME type. - **Sidecar QC (`.pdf`, `.md5`)**: Quality control and checksum files. > **Important:** all files belonging to the same asset must share the same 12-character inventory code (e.g., `VO-UMT-14387`). --- ## Process Phases The importer runs in three clearly separated phases (each phase is logged in detail): ### Phase 1 – S3 discovery + initial validation - List objects in the configured S3 bucket. - Keep only allowed extensions: `.mp4`, `.mp3`, `.json`, `.pdf`, `.md5`. - Exclude configured folders (e.g., `TEST-FOLDER-DEV/`, `DOCUMENTAZIONE_FOTOGRAFICA/`, `UMT/`). - Validate the inventory code format and ensure the folder prefix matches the type encoded in the inventory code. - Files failing validation are rejected **before** any database interaction. ### Phase 2 – Database cross-reference + filtering - Load existing filenames from the database. - Skip files already represented in the DB, including sidecar records. - Build the final list of S3 objects to parse. ### Phase 3 – Parse & insert - Read and validate sidecars (`.json`, `.md5`, `.pdf`) alongside the media file. - Use metadata (from `mediainfo` / `ffprobe`) to derive the **master mime type** and enforce container rules. - Insert new records into the database (unless `ACH_DRY_RUN=true`). --- ## Validation Policy The import pipeline enforces strict validation to prevent bad data from entering the database. ### Inventory Code & Folder Prefix - Expected inventory code format: `^[VA][OC]-[A-Z0-9]{3}-\d{5}$`. - The folder prefix (e.g., `BRD/`, `DVD/`, `FILE/`) must match the code type. - If the prefix does not match the inventory code, the file is rejected in Phase 1. ### Safe Run (`ACH_SAFE_RUN`) - When `ACH_SAFE_RUN=true`, **any warning during Phase 3 causes an immediate abort**. - This prevents partial inserts when the importer detects inconsistent or already-present data. ### MIME Type Determination - The MIME type for master files is derived from the JSON sidecar metadata (`mediainfo` / `ffprobe`), not from the streaming derivative extension. --- ## Quick Start (Local) ### Prerequisites - Python 3.8+ - Virtual environment support (`venv`) ### Setup ```bash python3 -m venv .venv source .venv/bin/activate pip install -r requirements.txt ``` ### Run ```bash python main.py ``` --- ## Docker (docker-compose) The project includes a `docker-compose.yml` with an `app` service (container name `ACH_server_media_importer`). It reads environment variables from `.env` and mounts a `logs` volume. ### Build & run ```bash docker compose up -d --build ``` ### Logs ```bash docker compose logs -f app ``` ### Run inside the container (from the host) If you want to execute the importer manually inside the running container (for debugging or one-off runs), you can use either of the following: ```bash # Using docker compose (recommended) docker compose exec app python /app/main.py # Or using docker exec with the container name docker exec -it ACH_server_media_importer python /app/main.py ``` ### Stop ```bash docker compose stop ``` ### Rebuild (clean) ```bash docker compose down --volumes --rmi local docker compose up -d --build ``` --- ## Configuration Configuration is driven by `.env` and `config.py`. Key variables include: - `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `AWS_REGION`, `BUCKET_NAME` - `DB_HOST`, `DB_NAME`, `DB_USER`, `DB_PASSWORD`, `DB_PORT` - `ACH_DRY_RUN` (`true` / `false`) - `ACH_SAFE_RUN` (`true` / `false`) --- ## Troubleshooting - If Docker does not pick up changes, ensure `docker compose up -d --build` is run from the repo root. - Inspect runtime errors via `docker compose logs -f app`.