Pgvecto.rs Save Abandoned

🚧 WIP 🚧 Vector database plugin for Postgres, written in Rust, specifically designed for LLM.

Project README

pgvecto.rs

discord invitation link trackgit-views all-contributors

pgvecto.rs is a (🚧 working in progress) Postgres extension that provides vector similarity search functions. It is written in Rust and based on pgrx.

Why use pgvecto.rs

  • 💃 Easy to use: pgvecto.rs is a Postgres extension, which means that you can use it directly within your existing database. This makes it easy to integrate into your existing workflows and applications.
  • 🦀 Rewrite in Rust: Rewriting in Rust offers benefits such as improved memory safety, better performance, and reduced maintenance costs over time.
  • 🙋 Community: People loves Rust We are happy to help you with any questions you may have. You could join our Discord to get in touch with us.

Why not a specialty vector database?

Imagine this, your existing data is stored in a Postgres database, and you want to use a vector database to do some vector similarity search. You have to move your data from Postgres to the vector database, and you have to maintain two databases at the same time. This is not a good idea.

Why not just use Postgres to do the vector similarity search? This is the reason why we build pgvecto.rs. The user journey is like this:

-- Update the embedding column for the documents table
UPDATE documents SET embedding = ai_embedding_vector(content) WHERE length(embedding) = 0;

-- Create an index on the embedding column
CREATE INDEX ON documents USING ivfflat (embedding vector_l2_ops) WITH (lists = 100);

-- Query the similar embeddings
SELECT * FROM documents ORDER BY embedding <-> ai_embedding_vector('hello world') LIMIT 5;

From SingleStore DB Blog:

Vectors and vector search are a data type and query processing approach, not a foundation for a new way of processing data. Using a specialty vector database (SVDB) will lead to the usual problems we see (and solve) again and again with our customers who use multiple specialty systems: redundant data, excessive data movement, lack of agreement on data values among distributed components, extra labor expense for specialized skills, extra licensing costs, limited query language power, programmability and extensibility, limited tool integration, and poor data integrity and availability compared with a true DBMS.

Setting up the development environment

You could use envd to set up the development environment with one command. It will create a docker container and install all the dependencies for you.

pip install envd
envd up

Build from source

cargo install cargo-pgrx
cargo pgrx init
cargo pgrx run

Getting Started

Installation

-- install the extension
DROP EXTENSION IF EXISTS vectors;
CREATE EXTENSION vectors;
-- check the extension related functions
\df+

Calculate the distance

We support three operators to calculate the distance between two vectors:

  • <->: square Euclidean distance
  • <#>: dot product distance
  • <=>: cosine distance
-- call the distance function through operators

-- square Euclidean distance
SELECT array[1, 2, 3] <-> array[3, 2, 1];
-- dot product distance
SELECT array[1, 2, 3] <#> array[3, 2, 1];
-- cosine distance
SELECT array[1, 2, 3] <=> array[3, 2, 1];

Create a table

You could use the CREATE TABLE statement to create a table with a vector column.

-- create table
CREATE TABLE items (id bigserial PRIMARY KEY, emb numeric[]);
-- insert values
INSERT INTO items (emb) VALUES (ARRAY[1,2,3]), (ARRAY[4,5,6]);
-- query the similar embeddings
SELECT * FROM items ORDER BY emb <-> ARRAY[3,2,1]::real[] LIMIT 5;
-- query the neighbors within a certain distance
SELECT * FROM items WHERE emb <-> ARRAY[3,2,1]::real[] < 5;

Create an index

We planning to support the following index types (issue here):

  • IVF
  • HNSW
  • ScaNN

Welcome to contribute if you are also interested!

Contributing

We need your help! Please check out the issues.

Contributors ✨

Thanks goes to these wonderful people (emoji key):

Alex Chi
Alex Chi

💻
Ce Gao
Ce Gao

💼 🖋 📖
Jinjing Zhou
Jinjing Zhou

🎨 🤔 📆
Keming
Keming

🐛 💻 📖 🤔 🚇
odysa
odysa

📖 💻
Add your contributions

This project follows the all-contributors specification. Contributions of any kind welcome!

Acknowledgements

Thanks to the following projects:

  • pgrx - Postgres extension framework in Rust
  • pgvector - Postgres extension for vector similarity search written in C
Open Source Agenda is not affiliated with "Pgvecto.rs" Project. README Source: tensorchord/pgvecto.rs
Stars
206
Open Issues
11
Last Commit
11 months ago
License

Open Source Agenda Badge

Open Source Agenda Rating