Duplicate Image Finder Versions Save

difPy - Python package for finding duplicate or similar images within folders

v4.1.0

2 months ago

Major enhancements and new features:

  • Enhancement: provides a fix for #80. difPy now comes with an improved algorithm for handling larger datasets in order to be more memory efficient, see Using difPy with Large Datasets. As part of this enhancement, two new parameters were added:
    • processes has been added to difPy.build and difPy.search, and defines the number of worker processes for multiprocessing. Read more here.
    • chunksize has been added to difPy.search and sets the batch size at which the job is simultaneously processed when multiprocessing. Read more here.
  • Enhancement: difPy comes with improved performance due to major improvements in the comparison algorithms. As part of this enhancement, a new parameter was added:
    • lazy was added to difPy.search which allows difPy to search more efficiently for exact duplicates (i. e. two exact file copies). By default, lazy is set to True and should only be turned off when searching for images that are not exact duplicates (i. e. having different dimensions, different file types, etc.). Read more here.
  • Enhancement: the default value of the similarity parameter was reduced from 50 to 5.
  • Enhancement: the progress bar has been improved.
  • New feature: difPy.search now supports the rotate parameter. If set to False, images will not be rotated on comparison, which can significantly reduce comparison times. Read more here.
  • New feature: the output structure of difPy has been adjusted for improved user-friendliness: the structure of search.result is now simpler with less levels of depth, and search.lower_quality now comes as a list. When invoked via the CLI, the lower_quality output file will now be in .txt format.

See the difPy usage guide for more details. Happy deduplicating! 🎉

v4.1.0-beta

2 months ago

Initial beta version release of difPy v4.1.0.

v4.0.1

7 months ago

Minor bug fixes:

  • Fixed issue #77 where difPy would throw an error when extracting the file extension from files that contain a dot in the name.
  • Fixed issue #78 where search.delete() would never successfully delete the lower quality images.
  • Implemented and fixed handling for issue #75.

v4.0.0

8 months ago

difPy v4 comes with major updates, code improvements and new features 🎉

  • Enhancement: difPy now leverages Python's multiprocessing capabilities which results in significant performance increases. In tests, difPy v4 was on average 10x faster than it's previous versions on the same datasets. Suggested by @TheLastGimbus and @thecodingchicken.
  • Enhancement: difPy's processes have been split into difPy.build and difPy.search so that multiple searches can be performed on the same image repository and not having to re-build it.
  • Enhancement: difPy is now more lightweight thanks to reduced dependency on external packages and increased usage of Numpy capabilities.
  • New feature: difPy now supports in_folder search, allowing to search separately among directories, instead of in the union of all directories. Suggested by @audiomuze in #53.
  • New feature: when running from the CLI, if no folder is specified, difPy will run by default in the working directory.
  • Various other minor code improvements

The usage of difPy v4 varies significantly to previous versions. It is therefore recommended to see the updated difPy Usage Documentation.

v.4.0.0-beta

8 months ago

Initial beta version release of difPy v4.

v3.0.10

1 year ago

New features and bug fixes:

  • New feature: limit_extensions option has been added to limit difPy's file search only to known image file extensions, leading to increased speed and performance. Suggested by @audiomuze and implemented by @UplandsDynamic.
  • Enhancement: added more details to the search.stats output related to limit_extensions. It now includes logs of which files were skipped. Implemented by @UplandsDynamic.
  • Bug fix: fixed issue #70 where difPy's dependencies would not be included with its installation, leading to a ModuleNotFoundError.
  • Implemented Python version control in the PyPI package, ensuring difPy is only installed with Python versions >= 3.8
  • Various other code improvements.

v3.0.9

1 year ago

Improvements and bug fixes:

  • @UplandsDynamic implemented a feature improvement: search.stats now includes the logs of the deleted_files if the logs parameter is set to True
  • @UplandsDynamic fixed the issue where the count of deletes files would not be accurate.
  • Various other minor code improvements.

Announcement: :tada: On March 24, 2023 difPy reached 20k downloads on PyPi - thank you! :bouquet: To celebrate, I am happy to announce the release of the official difPy.app, a web based app that lets you compare images with difPy right from your browser! Read more.

v3.0.8

1 year ago

Improvements and new features:

  • New feature: to make difPy more user friendly, the similarity parameter now only accepts two options: 'duplicates' or 'similar'.
  • Enhancement: the difPy PyPi package now comes with a Python Wheel for faster installation.
  • Other minor bug fixes.

v3.0.7

1 year ago

Minor improvements:

  • Enhancement: added validation that makes sure difPy does not compare given directories that belong to one another.
  • Minor code improvements

v3.0.6

1 year ago

Minor improvements and fixes:

  • Enhancement: added to option to automatically move the search.lower_quality images to a different folder, as suggested by @ManthanRami
  • Minor code improvements
  • difPy Usage Documentation has been moved to difPy.readthedocs.io.