MSVS-cleanDupli/readme.md

3.2 KiB

Clean duplicate files from the resultsduplicate.txt report.

Use Duplicate file finder to generate the file

This repository contains a small script main.py that parses a report file (resultsduplicate.txt) produced by a specialized duplicate detection tool. It moves files that are duplicates and located in the configured VAR_DIRECTORY to the system trash (or a local recycle bin fallback) instead of deleting them permanently.

File format (example block in resultsduplicate.txt):

  • 2 equal files of size 5256842 "I:\01_AI\01_IMAGES\00_Input\old\2k-ComfyUI-faceD_00050_.png" "I:\01_AI\01_IMAGES\55_Img2Img\20240206-A-selected\2k-ComfyUI-faceD_00050_.png"

Notes and behavior:

  • The first line of a block indicates the number of identical files (2, 3, ...)
  • The report contains groups of paths (quoted) listed under each header
  • If any path in a group points to VAR_DIRECTORY (usually the old folder), those files are candidates to be removed
  • The script will move any file(s) inside VAR_DIRECTORY to the trash and leave other copies intact
  • As a safety measure, if all copies in a group are inside VAR_DIRECTORY, the script will skip that block (to avoid deleting the only remaining copies) and log a warning

Configuration (in main.py):

  • DUPLICATES_FILE - path to the duplicates report (default: resultsduplicate.txt)
  • VAR_DIRECTORY - the directory whose files should be removed (e.g., the old folder)
  • DRY_RUN - if True, the script will only log actions and not move files
  • LOG_LEVEL - logging level (logging.INFO, logging.DEBUG, etc.)

Installation (recommended):

python -m pip install --user send2trash

send2trash is recommended to move files to the system recycle bin safely. If send2trash is not installed, the script falls back to moving files to a local .recycle_bin directory in the project.

Usage:

# dry-run first (only log what would be removed)
python main.py

# to actually remove files, edit main.py and set `DRY_RUN = False`
python main.py
  • The script logs events at levels DEBUG/INFO/WARNING/ERROR
  • Configure LOG_LEVEL near the top of main.py to change log verbosity

Counters & progress:

  • The script logs a running progress message every 200 processed candidate files (change log_every_n in main.py).

  • Output summary includes:

    • total_blocks - number of blocks parsed
    • candidate_files - number of files eligible in VAR_DIRECTORY
    • would_move_files - number of files that would be moved when DRY_RUN=True
    • moved_files - number of files actually moved when DRY_RUN=False
    • skipped_blocks - blocks that were skipped for safety (no copies outside VAR_DIRECTORY)
    • errors - operation errors during the run
  • The script logs events at levels DEBUG/INFO/WARNING/ERROR

  • Configure LOG_LEVEL near the top of main.py to change log verbosity

Safety tips:

  • Run with DRY_RUN=True and inspect the logs before making changes
  • Make sure DUPLICATES_FILE points to a fresh report and that backups exist for important data

If you want the script to behave differently (e.g., delete only single files or keep one copy per project), ask and I can implement more advanced options.