# Florence-2 Captioning Pipeline High-throughput asynchronous captioning pipeline using **Florence-2 Base PromptGen**. ## Goals - Download images from S3/HTTP concurrently - Preprocess (resize/normalize) - Run batched caption generation on GPU - Persist captions back to a database (async) ## Project structure - `src/`: implementation code - `tests/`: unit/integration tests - `todo.md`: tasks list - `implementationPlanV2.md`: architecture + design notes ## Quickstart 1. Install dependencies: ```bash pip install -r requirements.txt ``` 2. Configure environment variables (see `src/config.py` for expected vars). 3. Run the pipeline (example): ```bash python -m src.pipeline --dry-run ``` ## Notes This repo is intended as a foundation for building a fast, async dataset captioning tool.