804 B

Raw Blame History

Florence-2 Captioning Pipeline

High-throughput asynchronous captioning pipeline using Florence-2 Base PromptGen.

Goals

Download images from S3/HTTP concurrently
Preprocess (resize/normalize)
Run batched caption generation on GPU
Persist captions back to a database (async)

Project structure

src/: implementation code
tests/: unit/integration tests
todo.md: tasks list
implementationPlanV2.md: architecture + design notes

Quickstart

Install dependencies:

pip install -r requirements.txt

Configure environment variables (see src/config.py for expected vars).
Run the pipeline (example):

python -m src.pipeline --dry-run

Notes

This repo is intended as a foundation for building a fast, async dataset captioning tool.