

GitHub - hora-search/hora: 🚀 efficient approximate nearest neighbor search algo...
source link: https://github.com/hora-search/hora
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

[Homepage] [Document] [Examples]
Hora Search Everywhere!
Hora is an approximate nearest neighbor search algorithm (wiki) library. We implement all code in Rust🦀
for reliability, high level abstraction and high speeds comparable to C++
.
Hora, 「ほら」
in Japanese, sounds like [hōlə]
, and means Wow
, You see!
or Look at that!
. The name is inspired by a famous Japanese song 「小さな恋のうた」
.
Demos
Face-Match [online demo], have a try!
Dream wine comments search [online demo], have a try!
Features
-
Performant
- SIMD-Accelerated (packed_simd)
- Stable algorithm implementation
- Multiple threads design
-
Supports Multiple Languages
Python
Javascript
Java
Go
(WIP)Ruby
(WIP)Swift
(WIP)R
(WIP)Julia
(WIP)- Can also be used as a service
-
Supports Multiple Indexes
-
Portable
- Supports
WebAssembly
- Supports
Windows
,Linux
andOS X
- Supports
IOS
andAndroid
(WIP) - Supports
no_std
(WIP, partial) - No heavy dependencies, such as
BLAS
- Supports
-
Reliability
Rust
compiler secures all code- Memory managed by
Rust
for all language libraries such asPython's
- Broad testing coverage
-
Supports Multiple Distances
Dot Product Distance
Euclidean Distance
Manhattan Distance
Cosine Similarity
-
Productive
- Well documented
- Elegant, simple and easy to learn API
Installation
Rust
in Cargo.toml
[dependencies] hora = "0.1.1"
Python
$ pip install horapy
Javascript (WebAssembly)
$ npm i horajs
Building from source
$ git clone https://github.com/hora-search/hora $ cargo build
Benchmarks
by aws t2.medium (CPU: Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz)
more information
Examples
Rust
example [more info]
use hora::core::ann_index::ANNIndex; use rand::{thread_rng, Rng}; use rand_distr::{Distribution, Normal}; pub fn demo() { let n = 1000; let dimension = 64; // make sample points let mut samples = Vec::with_capacity(n); let normal = Normal::new(0.0, 10.0).unwrap(); for _i in 0..n { let mut sample = Vec::with_capacity(dimension); for _j in 0..dimension { sample.push(normal.sample(&mut rand::thread_rng())); } samples.push(sample); } // init index let mut index = hora::index::hnsw_idx::HNSWIndex::<f32, usize>::new( dimension, &hora::index::hnsw_params::HNSWParams::<f32>::default(), ); for (i, sample) in samples.iter().enumerate().take(n) { // add point index.add(sample, i).unwrap(); } index.build(hora::core::metrics::Metric::Euclidean).unwrap(); let mut rng = thread_rng(); let target: usize = rng.gen_range(0..n); // 523 has neighbors: [523, 762, 364, 268, 561, 231, 380, 817, 331, 246] println!( "{:?} has neighbors: {:?}", target, index.search(&samples[target], 10) // search for k nearest neighbors ); }
Python
example [more info]
import numpy as np from horapy import HNSWIndex dimension = 50 n = 1000 # init index instance index = HNSWIndex(dimension, "usize") samples = np.float32(np.random.rand(n, dimension)) for i in range(0, len(samples)): # add node index.add(np.float32(samples[i]), i) index.build("euclidean") # build index target = np.random.randint(0, n) # 410 in Hora ANNIndex <HNSWIndexUsize> (dimension: 50, dtype: usize, max_item: 1000000, n_neigh: 32, n_neigh0: 64, ef_build: 20, ef_search: 500, has_deletion: False) # has neighbors: [410, 736, 65, 36, 631, 83, 111, 254, 990, 161] print("{} in {} \nhas neighbors: {}".format( target, index, index.search(samples[target], 10))) # search
JavaScript
example [more info]
import * as horajs from "horajs"; const demo = () => { const dimension = 50; var bf_idx = horajs.BruteForceIndexUsize.new(dimension); // var hnsw_idx = horajs.HNSWIndexUsize.new(dimension, 1000000, 32, 64, 20, 500, 16, false); for (var i = 0; i < 1000; i++) { var feature = []; for (var j = 0; j < dimension; j++) { feature.push(Math.random()); } bf_idx.add(feature, i); // add point } bf_idx.build("euclidean"); // build index var feature = []; for (var j = 0; j < dimension; j++) { feature.push(Math.random()); } console.log("bf result", bf_idx.search(feature, 10)); //bf result Uint32Array(10) [704, 113, 358, 835, 408, 379, 117, 414, 808, 826] } (async () => { await horajs.default(); await horajs.init_env(); demo(); })();
Java
example [more info]
public void demo() { final int dimension = 2; final float variance = 2.0f; Random fRandom = new Random(); BruteForceIndex bruteforce_idx = new BruteForceIndex(dimension); // init index instance List<float[]> tmp = new ArrayList<>(); for (int i = 0; i < 5; i++) { for (int p = 0; p < 10; p++) { float[] features = new float[dimension]; for (int j = 0; j < dimension; j++) { features[j] = getGaussian(fRandom, (float) (i * 10), variance); } bruteforce_idx.add("bf", features, i * 10 + p); // add point tmp.add(features); } } bruteforce_idx.build("bf", "euclidean"); // build index int search_index = fRandom.nextInt(tmp.size()); // nearest neighbor search int[] result = bruteforce_idx.search("bf", 10, tmp.get(search_index)); // [main] INFO com.hora.app.ANNIndexTest - demo bruteforce_idx[7, 8, 0, 5, 3, 9, 1, 6, 4, 2] log.info("demo bruteforce_idx" + Arrays.toString(result)); } private static float getGaussian(Random fRandom, float aMean, float variance) { float r = (float) fRandom.nextGaussian(); return aMean + r * variance; }
Roadmap
- Full test coverage
- Implement EFANNA algorithm to achieve faster KNN graph building
- Swift support and iOS/macOS deployment example
- Support
R
- support
mmap
Related Projects and Comparison
-
Hora
's implementation is strongly inspired by these libraries.Faiss
focuses more on the GPU scenerio, andHora
is lighter than Faiss (no heavy dependencies).Hora
expects to support more languages, and everything related to performance will be implemented by Rust.
Annoy
only supports theLSH (Random Projection)
algorithm.ScaNN
andFaiss
are less user-friendly, (e.g. lack of documentation).- Hora is ALL IN RUST
.
-
Milvus
andVald
also support multiple languages, but serve as a service instead of a libraryMilvus
is built upon some libraries such asFaiss
, whileHora
is a library with all the algorithms implemented itself
Contribute
We appreciate your help!
We are glad to have you participate, any contributions are welcome, including documentations and tests.
You can create a Pull Request
or Issue
on GitHub, and we will review it as soon as possible.
We use GitHub issues for tracking suggestions and bugs.
Clone the repo
git clone https://github.com/hora-search/hora
Build
cargo build
cargo test --lib
Try the changes
cd examples cargo run
License
The entire repository is licensed under the Apache License.
Recommend
-
33
Neighborhood Graph and Tree for Indexing High-dimensional Data
-
7
How to use Google search to find the nearest vaccination center in India
-
11
Nearest Neighbor Indexes for Similarity SearchVector simi...
-
8
PostGIS Nearest Neighbor Syntax 16 Dec 2021 It turns out that it is possible to get an indexed n-nearest-neighbor (KNN) search out of PostGIS along with a distance using only one distance calculation and one target litera...
-
9
K-Nearest Neighbor Classification and vehicle evaluationHello there!Are you purchasing a new car or want to know whether your current car was a good purchase was not?You are in good luck.In this blog,...
-
6
最近邻搜索 (Nearest Neighbor Search) 范叶亮 / 2020-08-01 分类: 机器学习 / 标签:
-
5
Home » UncategorizedImplementing kd-tree for fast range-search, nearest-neighbor search and k-nearest-neighb...
-
3
New approximate nearest neighbor benchmarks 2018-06-17As some of you may know, one of my side interests is approximate nearest neighbor algorithms. I'm the author of Annoy, a library w...
-
7
New benchmarks for approximate nearest neighbors 2018-02-15UPDATE(2018-06-17): There are is a later blog post with newer benchmarks! O...
-
5
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK