78 lines
3.2 KiB
Markdown
78 lines
3.2 KiB
Markdown
Clean duplicate files from the `resultsduplicate.txt` report.
|
|
|
|
Use Duplicate file finder to generate the file
|
|
|
|
This repository contains a small script `main.py` that parses a report file
|
|
(`resultsduplicate.txt`) produced by a specialized duplicate detection tool.
|
|
It moves files that are duplicates and located in the configured `VAR_DIRECTORY`
|
|
to the system trash (or a local recycle bin fallback) instead of deleting them
|
|
permanently.
|
|
|
|
File format (example block in `resultsduplicate.txt`):
|
|
|
|
- 2 equal files of size 5256842
|
|
"I:\01_AI\01_IMAGES\00_Input\old\2k-ComfyUI-faceD_00050_.png"
|
|
"I:\01_AI\01_IMAGES\55_Img2Img\20240206-A-selected\2k-ComfyUI-faceD_00050_.png"
|
|
|
|
Notes and behavior:
|
|
- The first line of a block indicates the number of identical files (2, 3, ...)
|
|
- The report contains groups of paths (quoted) listed under each header
|
|
- If any path in a group points to `VAR_DIRECTORY` (usually the `old` folder),
|
|
those files are candidates to be removed
|
|
- The script will move any file(s) inside `VAR_DIRECTORY` to the trash and leave
|
|
other copies intact
|
|
- As a safety measure, if *all* copies in a group are inside `VAR_DIRECTORY`,
|
|
the script will *skip* that block (to avoid deleting the only remaining
|
|
copies) and log a warning
|
|
|
|
Configuration (in `main.py`):
|
|
- `DUPLICATES_FILE` - path to the duplicates report (default: `resultsduplicate.txt`)
|
|
- `VAR_DIRECTORY` - the directory whose files should be removed (e.g., the
|
|
`old` folder)
|
|
- `DRY_RUN` - if `True`, the script will only log actions and not move files
|
|
- `LOG_LEVEL` - logging level (`logging.INFO`, `logging.DEBUG`, etc.)
|
|
|
|
Installation (recommended):
|
|
```pwsh
|
|
python -m pip install --user send2trash
|
|
```
|
|
|
|
`send2trash` is recommended to move files to the system recycle bin safely. If
|
|
`send2trash` is not installed, the script falls back to moving files to a local
|
|
`.recycle_bin` directory in the project.
|
|
|
|
Usage:
|
|
```pwsh
|
|
# dry-run first (only log what would be removed)
|
|
python main.py
|
|
|
|
# to actually remove files, edit main.py and set `DRY_RUN = False`
|
|
python main.py
|
|
```
|
|
|
|
- The script logs events at levels DEBUG/INFO/WARNING/ERROR
|
|
- Configure `LOG_LEVEL` near the top of `main.py` to change log verbosity
|
|
|
|
Counters & progress:
|
|
- The script logs a running progress message every 200 processed candidate files (change `log_every_n` in `main.py`).
|
|
- Output summary includes:
|
|
- `total_blocks` - number of blocks parsed
|
|
- `candidate_files` - number of files eligible in `VAR_DIRECTORY`
|
|
- `would_move_files` - number of files that *would* be moved when `DRY_RUN=True`
|
|
- `moved_files` - number of files actually moved when `DRY_RUN=False`
|
|
- `skipped_blocks` - blocks that were skipped for safety (no copies outside `VAR_DIRECTORY`)
|
|
- `errors` - operation errors during the run
|
|
|
|
- The script logs events at levels DEBUG/INFO/WARNING/ERROR
|
|
- Configure `LOG_LEVEL` near the top of `main.py` to change log verbosity
|
|
|
|
Safety tips:
|
|
- Run with `DRY_RUN=True` and inspect the logs before making changes
|
|
- Make sure `DUPLICATES_FILE` points to a fresh report and that backups exist
|
|
for important data
|
|
|
|
If you want the script to behave differently (e.g., delete only single files
|
|
or keep one copy per project), ask and I can implement more advanced options.
|
|
|
|
|