The automatic indexer for Postgres
The automatic indexer for Postgres
Read about how it works or watch the talk
First, install HypoPG on your database server. This doesn’t require a restart.
cd /tmp
curl -L https://github.com/HypoPG/hypopg/archive/1.3.1.tar.gz | tar xz
cd hypopg-1.3.1
make
make install # may need sudo
Note: If you have issues, make sure
postgresql-server-dev-*
is installed.
Enable logging for slow queries in your Postgres config file.
log_min_duration_statement = 10 # ms
And install the command line tool with:
gem install pgdexter
The command line tool is also available with Docker, Homebrew, or as a Linux package.
Dexter needs a connection to your database and a log file to process.
tail -F -n +1 <log-file> | dexter <connection-options>
This finds slow queries and generates output like:
Started
Processing 189 new query fingerprints
Index found: public.genres_movies (genre_id)
Index found: public.genres_movies (movie_id)
Index found: public.movies (title)
Index found: public.ratings (movie_id)
Index found: public.ratings (rating)
Index found: public.ratings (user_id)
Processing 12 new query fingerprints
To be safe, Dexter will not create indexes unless you pass the --create
flag. In this case, you’ll see:
Index found: public.ratings (user_id)
Creating index: CREATE INDEX CONCURRENTLY ON "public"."ratings" ("user_id")
Index created: 15243 ms
Dexter supports the same connection options as psql.
-h host -U user -p 5432 -d dbname
This includes URIs:
postgresql://user:[email protected]:5432/dbname
and connection strings:
host=localhost port=5432 dbname=mydb
There are many ways to collect queries. For real-time indexing, pipe your logfile:
tail -F -n +1 <log-file> | dexter <connection-options>
Pass a single statement with:
dexter <connection-options> -s "SELECT * FROM ..."
or pass files:
dexter <connection-options> <file1> <file2>
or collect running queries with:
dexter <connection-options> --pg-stat-activity
or use the pg_stat_statements extension:
dexter <connection-options> --pg-stat-statements
Note: Logs or running queries are highly preferred over pg_stat_statements, as pg_stat_statements often doesn’t store enough information to optimize queries.
To prevent one-off queries from being indexed, specify a minimum number of calls before a query is considered for indexing
dexter --min-calls 100
You can do the same for total time a query has run
dexter --min-time 10 # minutes
Specify the format
dexter --input-format csv
When streaming logs, specify the time to wait between processing queries
dexter --interval 60 # seconds
Ubuntu with PostgreSQL 12
tail -F -n +1 /var/log/postgresql/postgresql-12-main.log | sudo -u postgres dexter dbname
Homebrew on Mac
tail -F -n +1 /usr/local/var/postgres/server.log | dexter dbname
For best results, make sure your tables have been recently analyzed so statistics are up-to-date. You can ask Dexter to analyze tables it comes across that haven’t been analyzed in the past hour with:
dexter --analyze
You can exclude large or write-heavy tables from indexing with:
dexter --exclude table1,table2
Alternatively, you can specify which tables to index with:
dexter --include table3,table4
See how Dexter is processing queries with:
dexter --log-sql --log-level debug2
The hypopg
extension, which Dexter needs to run, is available on these providers.
For other providers, see this guide. To request a new extension:
Get the Docker image with:
docker pull ankane/dexter
And run it with:
docker run -ti ankane/dexter <connection-options>
For databases on the host machine, use host.docker.internal
as the hostname (on Linux, this requires Docker 20.04+ and --add-host=host.docker.internal:host-gateway
).
With Homebrew, you can use:
brew install ankane/brew/dexter
Run:
gem install pgdexter
To use master, run:
gem install specific_install
gem specific_install https://github.com/ankane/dexter.git
This software wouldn’t be possible without HypoPG, which allows you to create hypothetical indexes, and pg_query, which allows you to parse and fingerprint queries. A big thanks to Dalibo and Lukas Fittl respectively.
This is known as the Index Selection Problem (ISP).
Everyone is encouraged to help improve this project. Here are a few ways you can help:
To get started with development, run:
git clone https://github.com/ankane/dexter.git
cd dexter
bundle install
bundle exec rake install
To run tests, use:
createdb dexter_test
bundle exec rake test