Scaling DynamoDB: How partitions, hot keys, and split for heat impact performance (Part 3: Summary and best practices)

By mullaned2002

January 30, 2023

868

In Part 1 of this series, you learned about Amazon DynamoDB data loading strategies and the behavior of DynamoDB during short runs. In Part 2, you learned about query performance and the adaptive behavior of DynamoDB during sustained activity.

In this third and final post, we review what you’ve learned, plus offer a few additional insights on best practices when scaling DynamoDB. This post is intended to be used as a reference.

Partitions

As a serverless NoSQL database, DynamoDB scales through partitions. Additional partitions provide more capacity for both reads and writes.

A table might add partitions for many reasons, such as when:

A provisioned table has its throughput values raised higher than any time previously.
An on-demand table sees a new high-water mark in throughput received.
A partition gets close to 10 GB in size.
A partition receives read or write traffic near its throughput capacity limit.

Partitions today are only split, never merged. A table provisioned to a high capacity then to a lower capacity, or switched to on-demand mode, will keep its partitions. When you anticipate heavy traffic to a new on-demand table, it’s best to pre-warm the table by creating the table provisioned at the necessary capacity, and then switch it to on-demand.

That said, having more partitions than necessary will reduce the efficiency (in latency and cost) of the table, for example by potentially requiring more partitions to be traversed to generate Query and Scan results. So, it’s best not to provision above anticipated needs.

Partitions and their topology determine the maximum amount of throughput a table can achieve during any given second. A request might receive a throughput-exceeded exception for a variety of reasons, including when:

It accesses a partition that has reached its limits during that second.
It’s a table in provisioned mode where the throughput exceeds the provisioned amount.
It’s a table in on-demand mode where the throughput exceeds the table-level read and write throughput limits.

Burst capacity allows temporary over-consumption of table-level limits, but partition level limits are always enforced.

Partition keys

Items with the same partition key value (called an item collection) tend to initially reside in the same partition but might reside in different partitions if the item collection has been split across partitions.

Also, it’s common for items with different partition key values to reside in the same partition because most designs have more partition key values than partitions. Items are assigned based on their partition key hash values. Each partition is responsible for a specific range of the key space and is home to items whose hash values lie within its range.

Designing a table schema to have wide dispersion (high cardinality) of partition key values will naturally spread the data across partitions and can improve both load rates and query rates. Low cardinality partition keys—including global secondary index (GSI) partition keys—will tend to spread the data less smoothly and can cause uneven usage between partitions.

Beware that loading sequential data (where items have been sorted by the partition key making the load focus on one partition key at a time) creates a rolling hot partition.

Split for heat

When a table has uneven traffic between its partitions, split for heat might be able to spread data items having the same partition key value across different partitions by splitting the partition that’s receiving uneven traffic. It’s an implementation detail to determine exactly when and how split for heat applies.

Adaptive capacity capabilities including split for heat and burst capacity work on both base tables and GSIs.

The presence of any local secondary indexes (LSIs) on a table prevents partition splits from happening within an item collection. For this reason, consider a GSI instead of an LSI, even if the GSI would have the same partition key as the base table. If our test table in Part 1 and Part 2 had been created with an LSI, our query rate would have never improved beyond the initial rate.

Split for heat will only execute when it determines the split would be sufficiently beneficial based on recent activity. One common write pattern where split for heat would be determined not beneficial is when writing items with a certain partition key value and an ever-increasing sort key value (such as a timestamp), because no matter the chosen cut point, all new writes using that partition key will be on the second partition. This write pattern will limit the write activity to that partition key to 1,000 WCUs. If the sort key were random, splitting would be beneficial and thus a partition key could support unbounded WCUs.

This fact is true for GSIs as well as base tables. If you design a GSI with a low cardinality partition key and an ever-increasing sort key, you might experience write throttling that split for heat cannot alleviate. A write-throttled GSI will create back pressure that throttles writes on the base table.

It takes several minutes for the table to detect and adapt with partition splits. There is no notice that indicates when a partition has split; the performance simply improves. Most customers are never aware that DynamoDB has automatically adapted to their workload.

Closing and costing

So, what did AWS tell Accenture Federal Services? Remember it was their question that started this experimentation when they wanted to know:

How well DynamoDB would work for a lookup service.
What table design would work best.
What design would be most efficient in terms of runtime, cost, and scalability.
What would be the theoretical maximum lookups per second DynamoDB could achieve.
They were also concerned that their use case didn’t seem like a classic DynamoDB use case, because there was no obvious partition key. They wanted to know if that would limit performance.

AWS told them DynamoDB would work well, that using one static partition key value would work but using the first IP number would help it scale faster, and that there was no practical limit on the query rate the table could achieve.

As for costs, AWS estimated that a solution based on DynamoDB would cost about $0.18 to load the items, $0.05 per month ongoing in storage, and with fully flexible on-demand mode they could do eight million lookups for $1.

Conclusion

In this three-part series, you learned about DynamoDB internals by reviewing the results of testing the performance of loading and querying using different partition key designs. We discussed table partitions, partition key values, hot partitions, split for heat, burst capacity, and table-level throughput limits.

As always, you’re welcome to leave questions or feedback in the comments.

About the authors

Jason Hunter is a California-based Principal Solutions Architect specializing in DynamoDB. He’s been working with NoSQL Databases since 2003. He’s known for his contributions to Java, open source, and XML.

Vivek Natarajan is a CS major at Purdue and a Solutions Architect intern at AWS.

Scaling DynamoDB: How partitions, hot keys, and split for heat impact performance (Part 3: Summary and best practices)

Partitions

Partition keys

Split for heat

Closing and costing

Conclusion

About the authors

Schneider Electric automates Salesforce account hierarchy management with generative artificial intelligence (AI) using Amazon Aurora and Amazon Bedrock

Make relevant movie recommendations using Amazon Neptune, Amazon Neptune Machine Learning, and Amazon OpenSearch Service

Implement UUIDv7 in Amazon RDS for PostgreSQL using Trusted Language Extensions

LEAVE A REPLY Cancel reply

Most Popular

Schneider Electric automates Salesforce account hierarchy management with generative artificial intelligence (AI) using Amazon Aurora and Amazon Bedrock

Leverage enterprise data with Denodo and Vertex AI for generative AI applications

TypeScript takes aim at truthy and nullish bugs

Make relevant movie recommendations using Amazon Neptune, Amazon Neptune Machine Learning, and Amazon OpenSearch Service

Recent Comments

EDITOR PICKS

Exploring the Click Element Variable in Google Tag Manager

How to track events with Google Tag Manager and Google Analytics

Data Layer Variable in GTM: What, Why, and Where?

POPULAR POSTS

BlackRock scales its data operations with a unified platform on Google Cloud

Salesforce ERP Integration Streamlines Business Automation

Standardization, security, and governance across environments with Anthos Multi-Cloud

POPULAR CATEGORY