Multicolumn Index Order in Postgres
We started to notice some speed issues yesterday with certain queries in one of the largest tables (~50 million rows) in our PostgresSQL database, and we wanted to share some of the results of the investigation.
During our tests today, we noticed that the slow queries ordered by id DESC
were running in about 5 seconds, and queries that were ordered by
created_at DESC
were running in 0.1 milliseconds. Multicolumn indexes were present for both queries to use, and EXPLAIN ANALYZE
showed that the queries hitting those indexes with an index scan.
This led to the obvious question: wtf?
After some investigation, we found that the order of columns in the index were reversed. One had the columns defined as (feed_id, created_at DESC)
and
the other had the columns defined as (id DESC, feed_id)
. After looking at the PostgresSQL docs for multicolumn indexes, the issue was clear.
It turns out that you need to define your indexes with a constraint (feed_id
) first, so that you are only scanning a portion of the index. By defining
id
first, the full index was scanned, which resulted in very slow queries. We switched the order of the columns in the index to (feed_id, id DESC)
and the queries
are now running in the ~0.1 ms range, so you should notice that things are now much more responsive.
Please check out the relevant docs in the PostgresSQL 9.5 manual if you would like more info about multicolumn indexes.