Clean duplicate files from the `resultsduplicate.txt` report. Use Duplicate file finder to generate the file This repository contains a small script `main.py` that parses a report file (`resultsduplicate.txt`) produced by a specialized duplicate detection tool. It moves files that are duplicates and located in the configured `VAR_DIRECTORY` to the system trash (or a local recycle bin fallback) instead of deleting them permanently. File format (example block in `resultsduplicate.txt`): - 2 equal files of size 5256842 "I:\01_AI\01_IMAGES\00_Input\old\2k-ComfyUI-faceD_00050_.png" "I:\01_AI\01_IMAGES\55_Img2Img\20240206-A-selected\2k-ComfyUI-faceD_00050_.png" Notes and behavior: - The first line of a block indicates the number of identical files (2, 3, ...) - The report contains groups of paths (quoted) listed under each header - If any path in a group points to `VAR_DIRECTORY` (usually the `old` folder), those files are candidates to be removed - The script will move any file(s) inside `VAR_DIRECTORY` to the trash and leave other copies intact - As a safety measure, if *all* copies in a group are inside `VAR_DIRECTORY`, the script will *skip* that block (to avoid deleting the only remaining copies) and log a warning Configuration (in `main.py`): - `DUPLICATES_FILE` - path to the duplicates report (default: `resultsduplicate.txt`) - `VAR_DIRECTORY` - the directory whose files should be removed (e.g., the `old` folder) - `DRY_RUN` - if `True`, the script will only log actions and not move files - `LOG_LEVEL` - logging level (`logging.INFO`, `logging.DEBUG`, etc.) Installation (recommended): ```pwsh python -m pip install --user send2trash ``` `send2trash` is recommended to move files to the system recycle bin safely. If `send2trash` is not installed, the script falls back to moving files to a local `.recycle_bin` directory in the project. Usage: ```pwsh # dry-run first (only log what would be removed) python main.py # to actually remove files, edit main.py and set `DRY_RUN = False` python main.py ``` - The script logs events at levels DEBUG/INFO/WARNING/ERROR - Configure `LOG_LEVEL` near the top of `main.py` to change log verbosity Counters & progress: - The script logs a running progress message every 200 processed candidate files (change `log_every_n` in `main.py`). - Output summary includes: - `total_blocks` - number of blocks parsed - `candidate_files` - number of files eligible in `VAR_DIRECTORY` - `would_move_files` - number of files that *would* be moved when `DRY_RUN=True` - `moved_files` - number of files actually moved when `DRY_RUN=False` - `skipped_blocks` - blocks that were skipped for safety (no copies outside `VAR_DIRECTORY`) - `errors` - operation errors during the run - The script logs events at levels DEBUG/INFO/WARNING/ERROR - Configure `LOG_LEVEL` near the top of `main.py` to change log verbosity Safety tips: - Run with `DRY_RUN=True` and inspect the logs before making changes - Make sure `DUPLICATES_FILE` points to a fresh report and that backups exist for important data If you want the script to behave differently (e.g., delete only single files or keep one copy per project), ask and I can implement more advanced options.