BenJoyenConseil Rmi Save

A learned index structure

Project README

RMI

Go PkgGoDev codecov

A goland implementation of a RMI (Recursive Model Indexes), a Learned Index structure based on the research work by Kraska & al.

Fig 1 from the Case for Learned Index Structures

usage

Create an index and make lookups

// load the age column and parse values into float64 values
ageColumn := extractAgeColumn("data/people.csv")

// create an index over the age column
index := index.New(ageColumn)

// search an age and get back its line position inside the file people.csv
search, _ := strconv.ParseFloat(os.Args[1], 64)
lines, _ := index.Lookup(search)

the main.go file contains an example of a learned index overdata/people.csv age column.

It outputs :

$ cat data/people.csv
name,age,sex
jeanne,90,F
jean,23,M
Carlos,3,M
Carlotta,45,F
Miguel,1,M
Martine,1.5,F
Georgette,23,F

$ go run main.go 23
2020/11/15 20:29:56 People who are 23 years old are located at [8 3] inside data/people.csv 

This is the plot showing the approximation (the linear regression), the cumulative distribution function for each value, and the current age's value (the Keys of the index) :

Fig 2 the LearnedIndex over people.csv

features

  • A simple linear regression model learning the CDF of a float64 array
  • A learned index structure fitted on keys of a collection
  • Finding rows id on a CSV file
  • Return a list of lines matching the key
  • Use max + min error bounding elements to search quickly
  • Benchmarks InMemory LearnedIndex against InMem BinarySearch
  • Store offset lines and a primary key index
  • Store the sortedTable
  • CLI to create indexes over CSV
  • Benchmarks Learned against BinarySearchTree
  • A two layer recursive index
  • Learn on integer
  • Index is persistent and durable (on hard drive)
  • A sort algorythm using learned structure
  • Learning on string type ?
  • Tim Kraska, Alex Beutel, Ed H. Chi, Jeffrey Dean, and Neoklis Polyzotis. 2018. The Case for Learned Index Structures. In Proceedings of the 2018 International Conference on Management of Data (SIGMOD '18). Association for Computing Machinery, New York, NY, USA, 489–504. DOI:https://doi.org/10.1145/3183713.3196909

  • Ryan Marcus RMI's reference implementation

Open Source Agenda is not affiliated with "BenJoyenConseil Rmi" Project. README Source: BenJoyenConseil/rmi
Stars
52
Open Issues
4
Last Commit
3 years ago
Repository
License

Open Source Agenda Badge

Open Source Agenda Rating