3.2 KiB
Clean duplicate files from the resultsduplicate.txt report.
Use Duplicate file finder to generate the file
This repository contains a small script main.py that parses a report file
(resultsduplicate.txt) produced by a specialized duplicate detection tool.
It moves files that are duplicates and located in the configured VAR_DIRECTORY
to the system trash (or a local recycle bin fallback) instead of deleting them
permanently.
File format (example block in resultsduplicate.txt):
- 2 equal files of size 5256842 "I:\01_AI\01_IMAGES\00_Input\old\2k-ComfyUI-faceD_00050_.png" "I:\01_AI\01_IMAGES\55_Img2Img\20240206-A-selected\2k-ComfyUI-faceD_00050_.png"
Notes and behavior:
- The first line of a block indicates the number of identical files (2, 3, ...)
- The report contains groups of paths (quoted) listed under each header
- If any path in a group points to
VAR_DIRECTORY(usually theoldfolder), those files are candidates to be removed - The script will move any file(s) inside
VAR_DIRECTORYto the trash and leave other copies intact - As a safety measure, if all copies in a group are inside
VAR_DIRECTORY, the script will skip that block (to avoid deleting the only remaining copies) and log a warning
Configuration (in main.py):
DUPLICATES_FILE- path to the duplicates report (default:resultsduplicate.txt)VAR_DIRECTORY- the directory whose files should be removed (e.g., theoldfolder)DRY_RUN- ifTrue, the script will only log actions and not move filesLOG_LEVEL- logging level (logging.INFO,logging.DEBUG, etc.)
Installation (recommended):
python -m pip install --user send2trash
send2trash is recommended to move files to the system recycle bin safely. If
send2trash is not installed, the script falls back to moving files to a local
.recycle_bin directory in the project.
Usage:
# dry-run first (only log what would be removed)
python main.py
# to actually remove files, edit main.py and set `DRY_RUN = False`
python main.py
- The script logs events at levels DEBUG/INFO/WARNING/ERROR
- Configure
LOG_LEVELnear the top ofmain.pyto change log verbosity
Counters & progress:
-
The script logs a running progress message every 200 processed candidate files (change
log_every_ninmain.py). -
Output summary includes:
total_blocks- number of blocks parsedcandidate_files- number of files eligible inVAR_DIRECTORYwould_move_files- number of files that would be moved whenDRY_RUN=Truemoved_files- number of files actually moved whenDRY_RUN=Falseskipped_blocks- blocks that were skipped for safety (no copies outsideVAR_DIRECTORY)errors- operation errors during the run
-
The script logs events at levels DEBUG/INFO/WARNING/ERROR
-
Configure
LOG_LEVELnear the top ofmain.pyto change log verbosity
Safety tips:
- Run with
DRY_RUN=Trueand inspect the logs before making changes - Make sure
DUPLICATES_FILEpoints to a fresh report and that backups exist for important data
If you want the script to behave differently (e.g., delete only single files or keep one copy per project), ask and I can implement more advanced options.