Friday, May 17, 2024
No menu items!
HomeDatabase ManagementImprove query performance using partial indexes in Amazon DocumentDB

Improve query performance using partial indexes in Amazon DocumentDB

Amazon DocumentDB (with MongoDB compatibility) is a fully managed native JSON document database that makes it easy and cost effective to operate critical document workloads at virtually any scale without managing infrastructure. Amazon DocumentDB simplifies your architecture by providing built-in security best practices, continuous backups, and native integrations with other AWS services. You can enhance your applications with generative artificial intelligence (AI) and machine learning (ML) capabilities using vector search for Amazon DocumentDB and integration with Amazon SageMaker Canvas. As a document database, Amazon DocumentDB makes it straightforward to store, query, and index JSON data.

Recently, Amazon DocumentDB enhanced its indexing capabilities by adding support for partial indexes, which you can use to index only those documents that meet a specified index criterion. In this post, we walk you through the benefits and use cases of this new indexing capability.

Benefits of partial indexes

Unlike a normal index that indexes all documents in a collection, a partial index narrows down the index scope by indexing a subset of documents. We recommend using partial indexes when you have a large number of queries returned by a filtered expression and you want to index only these subsets of documents that meet a specified partial filter expression. A partial index can be comprised of fields that are indexed or non-indexed.

The following are some of the benefits of using partial indexes:

Improved query performance – A partial index is smaller compared to a normal index, and queries using partial indexes can be faster because they scan fewer index entries.
Smaller index size – By only indexing documents that meet a specified filter expression, partial indexes are smaller in size and are more likely to fit within the instance’s memory. This results in a higher buffer cache hit ratio and lower read input/output per second (IOPS).
Selective uniqueness and deletion in collection – By using a unique option with a partial index, you can maintain selective uniqueness for documents in a collection. Similarly, you can use a Time to Live (TTL) index with a partial filter expression to selectively delete documents from a collection.
Optimized write performance – Partial indexes can also reduce the index maintenance overhead—the index structure is updated only when documents that meet the filter criterion are inserted or updated to collection. With a smaller index size, write operations become faster, enhancing overall database performance.

Partial indexes are currently supported in Amazon DocumentDB 5.0 instance-based clusters. If you are running your workloads on Amazon DocumentDB 3.6 or 4.0 clusters, you can upgrade your cluster to Amazon DocumentDB 5.0 using an in-place major version upgrade.

Create a partial index

To create a partial index, you can use the db.collection.createIndex() method with the partialFilterExpression option, which accepts a document that specifies the filter condition:

db.<collection name>.createIndex({ key1, key2 … },{ partialFilterExpression: <expression> })

You can list partial indexes with partialFilterExpression with getIndexes():

db.<collection name>.getIndexes()

In a partial index, an index scan is only used when the filter predicate in the query exactly matches the partialFilterExpression.

Query a subset of data

Let’s a look at some use cases for partial indexes using an ecommerce order management application as an example.

In this use case, a query only fetches order details from the order collection that have the order status “In-Progress”. A full index on the field orderID will include all documents with status “In-Progress”, “Cancelled”,”Returned”, and “Completed”. In most cases, “In-Progress” orders will make up a smaller proportion of total orders in the orders collections.

Because this query only fetches “In-Progress” orders, a partial index is a more suitable index for this query. By creating a partial index that indexes only “In-Progress” orders, you can create a smaller index and improve query performance. You can create a partial index on the orderID field that indexes documents with the status “In-Progress”:

db.orders.createIndex(
{“orderID”: 1},
{“partialFilterExpression”:
{
“status”: {“$eq”: “In-Progress”}
}
}
)

Your queries with the same filters now use the partial index:

db.orders.find({“orderID”:3563, “status”: {“$eq”: “In-Progress”}})

Combine multiple operators

You can also combine multiple operators in a partial filter expression. Consider another use case in which you have to examine “In-Progress” orders with payment issues. Your query needs to retrieve orders with status as “In-Progress” and payment as “Not Approved”. To index documents that meet both criteria, you can create a partial index using the following code:

db.orders.createIndex(
{“orderID”: 1},
{“partialFilterExpression”:
{
$and :[
{“status”: {“$eq”: “In-Progress”}},
{“payment”: {“$eq”: “Not Approved”}},
{“amount”: {$gt: 100}}]
}
})

Queries with the same filters now use the partial index:

db.orders.find({“orderID”:4673, “status”:”In-Progress”,”payment”: “Not Approved”,”amount”: {$gt: 100}})

The following queries don’t use the partial index because an index scan is only used when the filter predicate in the query matches the partial filter expression exactly:

db.orders.find({“orderID”:4673, “payment”: “Not Approved”})
db.orders.find({“orderID”:4673, “status”:”In-Progress”,”payment”:”Approved”}

Additionally, inequality queries in Amazon DocumentDB will only use a partial index when the query filter predicate exactly matches the partialFilterExpression and is of the same datatype.

db.reviews.find({“orderID”:4673, “status” : “In-Progress”, “amount”: { $gt: 120 } })

Use TTL indexes

TTL indexes enable you to remove documents based on their age based on the timeout condition for each document. When a document reaches the specified TTL age, it is deleted from the collection. You can now selectively delete documents from a collection by creating a TTL index with partial index properties. For example, you may want to automatically delete orders where the order was delivered over 180 days ago:

db.orders.createIndex(
{ “OrderTimestamp”: 1 },
{expireAfterSeconds: 155526000 ,
partialFilterExpression: { “isDelivered”: { $eq: true } }
}
)

Maintain uniqueness in a collection

You can selectively maintain uniqueness in a collection using a unique partial index. Imagine another use case of a product (like a software product with multiple versions) that evolves over time. You only query the latest active version in most situations, but also keep a record of past inactive versions of the product ID for historical reference or analysis. You can use an index with both unique and partial properties to query for active product documents while maintaining uniqueness:

db.products.createIndex(
{“productID”: 1},
{“unique”:true,
“partialFilterExpression”:
{
{
“status”: {“$eq”: “active”}
}
}
}
)

There can be multiple inactive versions of the same productID, but there will only be one active version of the product indexed in the partial index.

Conclusion

In this post, we introduced partial indexes in Amazon DocumentDB and presented some of the benefits when using this feature. A partial index is a unique indexing property in Amazon DocumentDB that enables you to create an index on a subset of documents in a collection, based on a specified filter condition. Unlike traditional indexes that cover the entire collection, partial indexes narrow down the scope, optimizing performance for targeted queries.

For more information about partial indexes in Amazon DocumentDB, see Partial index.

About the Authors

Pratik Das is a Senior Product Manager at AWS. He enjoys working with customers looking to build resilient workloads and strong data foundations in the cloud. He brings expertise working with enterprises on modernization, analytics, and data transformation initiatives.

Anshu Vajpayee is a Senior Amazon DocumentDB Specialist Solutions Architect at AWS, helping customers adopt NoSQL databases and modernize applications using Amazon DocumentDB. Before joining AWS, he worked extensively with relational and NoSQL databases.

Read MoreAWS Database Blog

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments