Unlock the power of parallel indexing in Amazon DocumentDB

By mullaned2002

June 19, 2024

61

Parallel indexing in Amazon DocumentDB (with MongoDB compatibility) significantly reduces the time to create indexes. In this post, we show you how parallel indexing works, its benefits, and best practices for implementation.

Amazon DocumentDB is a fast, scalable, highly available, and fully managed document database service that supports MongoDB workloads. You can use the same MongoDB API 3.6, 4.0, and 5.0 application code, drivers, and tools to run, manage, and scale workloads on Amazon DocumentDB without worrying about managing the underlying infrastructure. As a document database, Amazon DocumentDB makes it simple to store, query, and index JSON data.

Although indexes improve query performance, creating indexes can be time-consuming, especially for large collections. Amazon DocumentDB now supports parallel index creation to decrease the time to create indexes. Parallel indexing uses multiple concurrent workers to scan the collection, the longest stage of the index creation process. In this post, we show you how parallel indexing can reduce the time by up-to 14X to create new indexes.

Parallel index creation reduces the time needed to create indexes by using multiple CPU cores. It will temporarily strain CPU and I/O resources, potentially impacting existing operations. It’s important to review your server’s existing resource consumption when deciding the degree of parallelism for this feature, and scale up the writer node if needed for the operation.

How to use parallel indexing

Parallel indexing is currently supported on Amazon DocumentDB version 4.0 and higher instance-based clusters, with instance types of 2xlarge and above.

To create an index in parallel, specify the workers option in the createIndexes command. The workers option indicates the number of workers to build the index. The default value is 2, but you can specify a higher value to improve the performance of the build process up to the 50% of the vCPU count of the primary instance. For example, to build an index in parallel using four workers, use the following command:

db.runCommand({
createIndexes:”collection”,indexes :[{key:{“field”:1},name:”idx_name”,workers:4}]
});

Results

We conducted tests on Amazon DocumentDB 5.0 with different worker numbers to assess the effectiveness of parallel indexing on a db.r6g.4xl instance. The following graph denotes the index creation performance for a dataset of 256 GB of 1 KB documents and an index created on a single field.

These results demonstrate the performance improvement achieved with the new parallel indexing feature of Amazon DocumentDB. The overall improvement ranged from 1.46–7.42 times faster.

Best practices

Using multiple workers can significantly reduce the time to create new indexes. However, it’s important to choose a number of workers that is appropriate for your workload and infrastructure.

Additionally, you can monitor the progress of the indexing process by using the db.currentOp() command in mongoshell. Index creation details are available in Amazon DocumentDB 5.0 and higher. See the following code:

db.currentOp({“command.createIndexes”: { $exists : true }});

This will return an output like the following screenshot.

Finally, if possible, try to build indexes during off-peak hours to minimize the impact on your application.

Conclusion

In this post, we showed you how parallel indexing can reduce the time to create new indexes.

Parallel indexing is a powerful new feature in Amazon DocumentDB that can help you supercharge your Amazon DocumentDB index creation. It offers a straightforward way to reduce the time needed to create new indexes.

The new index feature is available in all AWS Regions where Amazon DocumentDB is available at no additional cost. To learn more, refer to Managing Amazon DocumentDB indexes.

About the Authors

Srikar Kasireddy is a Database Specialist Solutions Architect at Amazon Web Services. He works with our customers to provide architecture guidance and database solutions, helping them innovate using AWS services to improve business value.

Tim Callaghan is a Principal DocumentDB Specialist Solutions Architect at AWS. He enjoys working with customers looking to modernize existing data-driven applications and build new ones. Prior to joining AWS, he has been both a producer and consumer of relational and NoSQL databases for over 30 years.

Unlock the power of parallel indexing in Amazon DocumentDB

How to use parallel indexing

Results

Best practices

Conclusion

About the Authors

Workaround for T-SQL global temporary tables in Babelfish for Aurora PostgreSQL

Create a Knowledge Graph application with metaphactory and Amazon Neptune

Configure SSL encryption on an SAP ASE source endpoint in AWS DMS

LEAVE A REPLY Cancel reply

Most Popular

Character.AI’s storybook ending with Memorystore for Redis Cluster

Cloud CISO Perspectives: How Google is helping to improve rural healthcare cybersecurity

Create custom metrics for Cloud SQL for PostgreSQL and AlloyDB using Logs Explorer

The Ultimate Beginner’s Guide to Docker

Recent Comments

EDITOR PICKS

Exploring the Click Element Variable in Google Tag Manager

How to track events with Google Tag Manager and Google Analytics

Data Layer Variable in GTM: What, Why, and Where?

POPULAR POSTS

New Cloud KMS Autokey can help encrypt your resources quickly and efficiently

Ease into similarity search with Google’s PaLM API

Improved Alerting with Atlas Streaming Eval

POPULAR CATEGORY