139 lines
4.2 KiB
Markdown
139 lines
4.2 KiB
Markdown
# ACH Server Media Import
|
||
|
||
This repository contains a script that imports media files from an S3-compatible bucket into a PostgreSQL database. It supports both local execution (Python virtual environment) and Docker deployment via `docker-compose`.
|
||
|
||
---
|
||
|
||
## Overview
|
||
|
||
### Asset hierarchy
|
||
- **Conservatory Copy (Master)**: High-quality source (e.g., `.mov`, `.wav`). This is the primary record in the database.
|
||
- **Streaming Copy (Derivative)**: Transcoded versions (`.mp4`, `.mp3`) linked to the master.
|
||
- **Sidecar Metadata (`.json`)**: Contains technical metadata (`mediainfo` / `ffprobe`) used for validation and to determine the correct MIME type.
|
||
- **Sidecar QC (`.pdf`, `.md5`)**: Quality control and checksum files.
|
||
|
||
> **Important:** all files belonging to the same asset must share the same 12-character inventory code (e.g., `VO-UMT-14387`).
|
||
|
||
---
|
||
|
||
## Process Phases
|
||
|
||
The importer runs in three clearly separated phases (each phase is logged in detail):
|
||
|
||
### Phase 1 – S3 discovery + initial validation
|
||
- List objects in the configured S3 bucket.
|
||
- Keep only allowed extensions: `.mp4`, `.mp3`, `.json`, `.pdf`, `.md5`.
|
||
- Exclude configured folders (e.g., `TEST-FOLDER-DEV/`, `DOCUMENTAZIONE_FOTOGRAFICA/`, `UMT/`).
|
||
- Validate the inventory code format and ensure the folder prefix matches the type encoded in the inventory code.
|
||
- Files failing validation are rejected **before** any database interaction.
|
||
|
||
### Phase 2 – Database cross-reference + filtering
|
||
- Load existing filenames from the database.
|
||
- Skip files already represented in the DB, including sidecar records.
|
||
- Build the final list of S3 objects to parse.
|
||
|
||
### Phase 3 – Parse & insert
|
||
- Read and validate sidecars (`.json`, `.md5`, `.pdf`) alongside the media file.
|
||
- Use metadata (from `mediainfo` / `ffprobe`) to derive the **master mime type** and enforce container rules.
|
||
- Insert new records into the database (unless `ACH_DRY_RUN=true`).
|
||
|
||
---
|
||
|
||
## Validation Policy
|
||
|
||
The import pipeline enforces strict validation to prevent bad data from entering the database.
|
||
|
||
### Inventory Code & Folder Prefix
|
||
- Expected inventory code format: `^[VA][OC]-[A-Z0-9]{3}-\d{5}$`.
|
||
- The folder prefix (e.g., `BRD/`, `DVD/`, `FILE/`) must match the code type.
|
||
- If the prefix does not match the inventory code, the file is rejected in Phase 1.
|
||
|
||
### Safe Run (`ACH_SAFE_RUN`)
|
||
- When `ACH_SAFE_RUN=true`, **any warning during Phase 3 causes an immediate abort**.
|
||
- This prevents partial inserts when the importer detects inconsistent or already-present data.
|
||
|
||
### MIME Type Determination
|
||
- The MIME type for master files is derived from the JSON sidecar metadata (`mediainfo` / `ffprobe`), not from the streaming derivative extension.
|
||
|
||
---
|
||
|
||
## Quick Start (Local)
|
||
|
||
### Prerequisites
|
||
- Python 3.8+
|
||
- Virtual environment support (`venv`)
|
||
|
||
### Setup
|
||
|
||
```bash
|
||
python3 -m venv .venv
|
||
source .venv/bin/activate
|
||
pip install -r requirements.txt
|
||
```
|
||
|
||
### Run
|
||
|
||
```bash
|
||
python main.py
|
||
```
|
||
|
||
---
|
||
|
||
## Docker (docker-compose)
|
||
|
||
The project includes a `docker-compose.yml` with an `app` service (container name `ACH_server_media_importer`). It reads environment variables from `.env` and mounts a `logs` volume.
|
||
|
||
### Build & run
|
||
|
||
```bash
|
||
docker compose up -d --build
|
||
```
|
||
|
||
### Logs
|
||
|
||
```bash
|
||
docker compose logs -f app
|
||
```
|
||
|
||
### Run inside the container (from the host)
|
||
|
||
If you want to execute the importer manually inside the running container (for debugging or one-off runs), you can use either of the following:
|
||
|
||
```bash
|
||
# Using docker compose (recommended)
|
||
docker compose exec app python /app/main.py
|
||
|
||
# Or using docker exec with the container name
|
||
docker exec -it ACH_server_media_importer python /app/main.py
|
||
```
|
||
|
||
### Stop
|
||
|
||
```bash
|
||
docker compose stop
|
||
```
|
||
|
||
### Rebuild (clean)
|
||
|
||
```bash
|
||
docker compose down --volumes --rmi local
|
||
docker compose up -d --build
|
||
```
|
||
|
||
---
|
||
|
||
## Configuration
|
||
|
||
Configuration is driven by `.env` and `config.py`. Key variables include:
|
||
- `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `AWS_REGION`, `BUCKET_NAME`
|
||
- `DB_HOST`, `DB_NAME`, `DB_USER`, `DB_PASSWORD`, `DB_PORT`
|
||
- `ACH_DRY_RUN` (`true` / `false`)
|
||
- `ACH_SAFE_RUN` (`true` / `false`)
|
||
|
||
---
|
||
|
||
## Troubleshooting
|
||
|
||
- If Docker does not pick up changes, ensure `docker compose up -d --build` is run from the repo root.
|
||
- Inspect runtime errors via `docker compose logs -f app`.
|