Unofficial python wrapper to the nanoflann k-d tree
Unofficial python wrapper to the nanoflann library [1] with sklearn interface and additional multithreaded capabilities.
nanoflann implementation of k-d tree provides one of the best performance for many k-nearest neighbour problems [2].
It is a good choice for exact k-NN, radius searches in low dimensional spaces.
pip install git+https://github.com/u1234x1234/[email protected]
import numpy as np
import pynanoflann
index = np.random.uniform(0, 100, (100, 4))
queries = np.random.uniform(0, 100, (10, 4))
nn = pynanoflann.KDTree(n_neighbors=5, metric='L1', radius=100)
nn.fit(index)
# Get k-nearest neighbors
distances, indices = nn.kneighbors(queries)
# Radius search
distances, indices = nn.radius_neighbors(queries)
If you need to save the model, there are two options:
kdtree.fit(X)
kdtree.save_index('index.bin')
prebuilt_kdtree = pynanoflann.KDTree()
# Must use the same data on which the index was built.
prebuilt_kdtree.fit(X, 'index.bin')
Please refer to the detailed example
kdtree.fit(X)
with open('kdtree.pkl', 'wb') as out_file:
pickle.dump(kdtree, out_file)
with open('kdtree.pkl', 'rb') as in_file:
unpickled_kdtree = pickle.load(in_file)
Please refer to the detailed example
Implemented on C++ side to reduce python's multiprocessing
overhead.
Query parallelization: Example
Simultaneous indexing+querying parallelization: Example, Discussion
Generally it much faster than brute force or cython implementation of k-d tree in sklearn
To run benchmark:
python benchmark.py