Post

Redshift Sort Keys Explained

Redshift Sort Keys Explained

Amazon Redshift – When (and Why) to Use Sort Keys

In Amazon Redshift, sort keys determine the physical order in which table rows are stored on disk. Redshift stores columnar data in 1 MB blocks, along with metadata that describes each block — including its minimum and maximum values.

When a query filters data by a range, the optimizer uses these min/max values to skip over entire blocks that don’t match the filter. For example, if a table contains five years of data sorted by date and you query just one month, Redshift can skip scanning most of the data, greatly reducing I/O and query time.

Sort keys are defined when a table is created (as part of the DDL). During initial data load, rows are written in sorted order. This metadata guides the query planner and can significantly improve performance — especially for analytical queries that use filters, joins, and aggregations.

Compound Sort Keys

A compound sort key consists of one or more columns defined in a specific order. Redshift sorts data by the first column, then by the next, and so on. Compound sort keys work best when your queries consistently filter or join on the same set of columns — especially when predicates reference the leading column. They also improve the performance of JOIN, GROUP BY, and ORDER BY operations, as well as window functions that use PARTITION BY or ORDER BY.

According to AWS best practices:

  • If recent data is queried most frequently, make the timestamp column the leading column in the sort key.
  • If you frequently filter or group by one column, use that column as the sort key.
  • For dimension tables used in joins, use the join column as the sort key.

Compound sort keys are generally preferred for most workloads, especially those with frequent updates or incremental loads, as they are easier to maintain than interleaved keys.

Interleaved Sort Keys

An interleaved sort key gives equal weight to each column in the key, unlike compound keys where order determines priority. This design can improve performance for workloads that filter on multiple different columns — for example, queries that sometimes filter by customer_id, sometimes by region, and other times by date.

However, interleaved sort keys require more maintenance. Frequent updates or uneven data distribution can cause key skew, reducing their effectiveness. Interleaved keys also take longer to vacuum and analyze. You can define up to eight columns in an interleaved sort key.

Choosing the Right Key

ScenarioRecommended Sort Key Type
Queries always filter by the same column or time rangeCompound (leading column = filter column)
Queries filter by several different columnsInterleaved
Tables updated frequently or incrementally loadedCompound
Tables used mainly for analytical readsEither, depending on filter patterns

Additional Resources

This post is licensed under CC BY 4.0 by the author.