1

New ways to use composite indexes in Azure Cosmos DB

 3 years ago
source link: https://devblogs.microsoft.com/cosmosdb/new-ways-to-use-composite-indexes/?WT_mc_id=DOP-MVP-4025064
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

New ways to use composite indexes in Azure Cosmos DB

Avatar

February 16th, 2021

You can now use composite indexes in Azure Cosmos DB to optimize additional cases of the most common inefficient queries! Anytime you have a slow or high request unit (RU) query, you should consider optimizing it with a composite index.

Current use cases for composite indexes:

Use caseExampleQueries that ORDER BY multiple properties
SELECT * 
FROM c
ORDER BY c.name, c.age
Queries with an equality filter and ORDER BY
SELECT *
FROM c
WHERE c.name = “Tim”
ORDER BY c.age
(NEW) Queries with multiple equality and/or range filters
SELECT *
FROM c
WHERE c.name = “Tim” AND c.age > 18 and c._ts > 100
(NEW) Queries with aggregates and equality filters
SELECT SUM(c.total)
FROM c
WHERE c.name = “Tim”
(NEW) Queries with system functions
SELECT *
FROM c
WHERE c.name = “Tim” AND CONTAINS(c.biography, “developer”)
ORDER BY c.name, c.biography

This post looks at examples of the new use cases for composite indexes. For these tests, we’ve ingested 9 million items into an Azure Cosmos DB container. Here is an example item:

Each item is a different product in an online clothing store. Multiple items can belong to the same CartId but each item has its own unique id value.

Queries with multiple equality and range filters:

Queries can now use multiple composite indexes in the same filter expression. This allows you to optimize queries that have multiple equality and range filters in the same filter expression. Remember, a single composite index can only apply to one range filter. Therefore, using multiple composite indexes can help optimize queries with multiple range filters.

Query:

SELECT TOP 10 *
FROM c
WHERE c.BuyerState = "TN" and c._ts > 1611947901 
and c.id > "9c5e4e08-e1d2-4091-8062-78d1da832523"

Composite indexes:

  1. (BuyerState ASC, _ts ASC)
  2. (BuyerState ASC, id ASC)

Query charge with default indexing policy: 533.47 RUs

Query charge with both composite indexes: 72.23 RUs

This optimization tends to have a higher impact for multiple range filters than queries with only equality filters. Composite indexes will have a bigger impact on high cardinality properties, such as _ts or id.

Queries with aggregates and an equality filters:

Aggregate queries (SUM,  AVG, MIN, or MAX) can now benefit from composite indexes. Previously, composite indexes could only optimize queries with a COUNT(1) aggregate.

Query: 
SELECT AVG(c._ts)
FROM c 
WHERE c.BuyerState = "TN"

Composite index: (BuyerState ASC, _ts ASC)

Query charge with default indexing policy: 186.25 RUs

Query charge with composite index: 39.8 RUs

Queries with higher cardinality properties typically benefit more from composite indexes. For example, if the property in the AVG() system function was price, instead of _ts, you might not need a composite index in the first place since price has fewer possible values than _ts.

Here’s an example:

Query:

SELECT AVG(c.price)
FROM c
WHERE c.BuyerState = "TN"

Composite index: (BuyerState ASC, price ASC)

Query charge with default indexing policy: 5.08 RUs

Query charge with composite index: 2.9 RUs

Queries with ORDER BY and system functions:

Queries with ORDER BY and system functions can now benefit from composite indexes. When you use a high cardinality property in the system function, composite indexes may have a big impact. In fact, the impact can be so significant, that it can be worthwhile to rewrite some queries with system functions to use ORDER BY so that they can leverage composite indexes.

Here’s an example:

Original Query: 
SELECT *
FROM c
WHERE c.BuyerState = "NJ" AND Contains (c.id, "abc", true)
Query with added ORDER BY:
SELECT *
FROM c
WHERE c.BuyerState = "NJ" AND Contains (c.id, "abc", true)
ORDER BY c.BuyerState, c.id

Composite index: (BuyerState ASC, id ASC)

Query charge with default indexing policy: 4,138.66 RUs

Query charge with composite index: 51.3 RUs

Adding an ORDER BY clause for the property in a system function improves system function execution. This is broader than just composite indexes and can often help in any query with a system function on a high cardinality property. Here’s an example that doesn’t involve composite indexes:

Here’s an example:

Original Query:

SELECT *
FROM c
WHERE RegexMatch (c.id, "abc")

Query with added ORDER BY:

SELECT *
FROM c
WHERE RegexMatch (c.id, "abc")
ORDER BY c.id

Query charge without ORDER BY: 8,845.61 RUs

Query charge with ORDER BY: 92.71 RUs

Queries with the following system functions are most likely to see big improvements since their efficiency depends on cardinality of the property in the system function:

  • Contains
  • RegexMatch
  • EndsWith

Having query results sorted can improve the way the query engine utilizes the index for high cardinality properties. Even if a query with a system function doesn’t require ORDER BY, you may be able to significantly improve by adding it, along with the relevant composite indexes.

Monitoring index policy transformations:

When you modify your indexing policy, the existing policy is automatically transformed to include the new additional indexed properties. When modifying indexing policy, there are some important points to keep in mind:

  • Indexing policy changes do not affect read or write availability
  • When you add a new index, the Cosmos DB query engine will not use the new index until the index transformation is complete
  • It is safe to modify indexing policy at any time, including when in production. It is common to add composite indexes while in production.

Additionally, it’s good to keep in mind that the following factors affect index transformation time:

  • Amount of data – The more data, the more time it takes for indexing policy transformations.
  • Number of added or removed indexes – A more significant change to the indexing policy will take longer than simple change (for example, adding a single composite index).
  • Available provisioned throughput – Indexing policy transformations will use RUs that are leftover from all other database operations.

You can track indexing policy transformation progress in the Azure Portal. Once the loading bar disappears, the indexing policy transformation has been successfully applied:

We look forward to using composite indexes for many additional query optimizations in the future! Stay up-to-date by reading our query troubleshooting guide when optimizing queries.

Learn more:

Tim Sander

Program Manager, Azure Cosmos DB

Follow


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK