MSVS-cleanDupli/readme.md

78 lines
3.2 KiB
Markdown

Clean duplicate files from the `resultsduplicate.txt` report.
Use Duplicate file finder to generate the file
This repository contains a small script `main.py` that parses a report file
(`resultsduplicate.txt`) produced by a specialized duplicate detection tool.
It moves files that are duplicates and located in the configured `VAR_DIRECTORY`
to the system trash (or a local recycle bin fallback) instead of deleting them
permanently.
File format (example block in `resultsduplicate.txt`):
- 2 equal files of size 5256842
"I:\01_AI\01_IMAGES\00_Input\old\2k-ComfyUI-faceD_00050_.png"
"I:\01_AI\01_IMAGES\55_Img2Img\20240206-A-selected\2k-ComfyUI-faceD_00050_.png"
Notes and behavior:
- The first line of a block indicates the number of identical files (2, 3, ...)
- The report contains groups of paths (quoted) listed under each header
- If any path in a group points to `VAR_DIRECTORY` (usually the `old` folder),
those files are candidates to be removed
- The script will move any file(s) inside `VAR_DIRECTORY` to the trash and leave
other copies intact
- As a safety measure, if *all* copies in a group are inside `VAR_DIRECTORY`,
the script will *skip* that block (to avoid deleting the only remaining
copies) and log a warning
Configuration (in `main.py`):
- `DUPLICATES_FILE` - path to the duplicates report (default: `resultsduplicate.txt`)
- `VAR_DIRECTORY` - the directory whose files should be removed (e.g., the
`old` folder)
- `DRY_RUN` - if `True`, the script will only log actions and not move files
- `LOG_LEVEL` - logging level (`logging.INFO`, `logging.DEBUG`, etc.)
Installation (recommended):
```pwsh
python -m pip install --user send2trash
```
`send2trash` is recommended to move files to the system recycle bin safely. If
`send2trash` is not installed, the script falls back to moving files to a local
`.recycle_bin` directory in the project.
Usage:
```pwsh
# dry-run first (only log what would be removed)
python main.py
# to actually remove files, edit main.py and set `DRY_RUN = False`
python main.py
```
- The script logs events at levels DEBUG/INFO/WARNING/ERROR
- Configure `LOG_LEVEL` near the top of `main.py` to change log verbosity
Counters & progress:
- The script logs a running progress message every 200 processed candidate files (change `log_every_n` in `main.py`).
- Output summary includes:
- `total_blocks` - number of blocks parsed
- `candidate_files` - number of files eligible in `VAR_DIRECTORY`
- `would_move_files` - number of files that *would* be moved when `DRY_RUN=True`
- `moved_files` - number of files actually moved when `DRY_RUN=False`
- `skipped_blocks` - blocks that were skipped for safety (no copies outside `VAR_DIRECTORY`)
- `errors` - operation errors during the run
- The script logs events at levels DEBUG/INFO/WARNING/ERROR
- Configure `LOG_LEVEL` near the top of `main.py` to change log verbosity
Safety tips:
- Run with `DRY_RUN=True` and inspect the logs before making changes
- Make sure `DUPLICATES_FILE` points to a fresh report and that backups exist
for important data
If you want the script to behave differently (e.g., delete only single files
or keep one copy per project), ask and I can implement more advanced options.