Welcome to PyMinHash documentation¶
MinHashing is a very efficient way of finding similar records in a dataset based on Jaccard similarity. PyMinHash implements efficient minhashing for Pandas dataframes. See instructions below or look at the example notebook to get started.
Install directly from PyPI:
pip install pyminhash
or using conda-forge:
conda install -c conda-forge pyminhash