Monday, October 14, 2024
No menu items!
HomeDatabase ManagementImplement auto-increment with Amazon DynamoDB

Implement auto-increment with Amazon DynamoDB

When developing an application with Amazon DynamoDB, sometimes you want new items inserted into a table to have an ever-increasing sequence number. Some databases call this auto-increment and automatically populate the value on insert. Example use-cases could be giving a customer order or a support ticket a numeric identifier.

DynamoDB doesn’t provide auto-increment as an attribute type, but there are several approaches to achieve an ever-increasing sequence number with DynamoDB. In this post, we demonstrate two simple and low-cost approaches.

Solution overview

Before we begin, consider if you truly need an ever-increasing sequence number. Randomly generated identifiers tend to scale better because they require no central coordination point. The situations where it’s appropriate to simulate auto-increment with DynamoDB tend to fall into two categories:

When migrating from a relational database where people or systems have grown accustomed to preexisting auto-increment style behavior
When the application must deliver a human-friendly growing numeric identifier to new items, like an employee number or ticket number

In the following sections, we show how to achieve an ever-increasing sequence number by using a counter or a sort key.

Implementation with a counter

The first approach to generating an ever-increasing sequence number uses an atomic counter. It’s a two-step process. First, make a request to increment the counter and receive the new value in the response. Second, use that new value in a subsequent write.

The following Python example updates an atomic counter to get the next value for an order ID, then inserts an order with the ID used as the partition key. With this approach, it’s possible to use a different value for the partition key and store the ID in another attribute.

import boto3

table = boto3.resource(‘dynamodb’).Table(‘orders’)

# Add one to the counter and ask for the new value to be returned
response = table.update_item(
Key={‘pk’: ‘orderCounter’},
UpdateExpression=”ADD #cnt :val”,
ExpressionAttributeNames={‘#cnt’: ‘count’},
ExpressionAttributeValues={‘:val’: 1},
ReturnValues=”UPDATED_NEW”
)

# Retrieve the new value
nextOrderId = response[‘Attributes’][‘count’]

# Use the new value
table.put_item(
Item={‘pk’: str(nextOrderId), ‘deliveryMethod’ : ‘expedited’}
)

There are no race conditions with this design because all writes to a single item in DynamoDB are applied serially. This ensures that each counter value will never be returned more than once.

The cost of this approach is 1 write to update the counter item, plus the usual write costs to store the new item. The maximum throughput of this approach is limited by the counter item. The maximum throughput for a single small item in DynamoDB is the same as the maximum throughput for a partition.

It’s possible for gaps to appear in the sequence if a failure occurs between updating the counter and writing the new item. For example, the client application may stop between the two steps or the automatic retry functionality in the AWS SDK may increase the counter more than once if the first value suffered a network failure as it was being returned. Note that gaps are possible with auto-increment columns as well.

Multiple counters could be maintained simultaneously if you need more than one sequence value for your table.

Implementation with a sort key

The second approach uses the maximum value of the sort key within an item collection to track the maximum sequence value for that item collection.

Items stored in a DynamoDB table can have a partition key and an optional sort key as part of their primary key. Items in an item collection have the same partition key but different sort keys. A DynamoDB query can target an item collection to retrieve all items in the collection, or can provide a sort key condition to retrieve a subset.

By designing the sort key to represent the item’s value in a sequence, it’s efficient to use a query to retrieve the maximum value of the sequence. The following example table holds projects and their issues. The project identifier is the partition key. The issue number is the sort key. The issue number increments independently for each project. The highest value for any project’s issue number is the highest value within the item collection.

Partition Key (a Project ID)
Sort Key (an Issue Number)
Priority

projectA
1
low

projectA
2
medium

projectB
1
low

projectB
2
high

projectB
3
low

Adding a new item to an item collection with the next sequence value requires a two-step process. First, perform a query to retrieve the highest sort key value for that item collection. Second, attempt to write the new item using the highest value plus 1. The write must include a condition expression that requires the item not yet exist in the table for the write to succeed. This avoids any race condition with clients that read the same value around the same time and attempt to insert items with the same primary key.

If the condition fails (because another client got there first and used the value), you can choose from two approaches to continue: go back to the start and query again for the highest value used, or try again with the sort key value increased by 1.

The following Python example demonstrates querying for the highest value used so far in an item collection (representing a project), then writing an item with the next value as the sort key. The example keeps retrying with an increasing sort key value until successful.

import boto3
from boto3.dynamodb.conditions import Key

PROJECT_ID = ‘projectA’

dynamo = boto3.resource(‘dynamodb’)
client = dynamo.Table(‘projects’)
highestIssueId = 0
saved = False

# Query for the last sorted value in the given item collection
response = client.query(
KeyConditionExpression=Key(‘pk’).eq(PROJECT_ID),
ScanIndexForward=False,
Limit=1
)

# Retrieve the sort key value
if response[‘Count’] > 0:
highestIssueId = int(response[‘Items’][0][‘sk’])

while not saved:
try:
# Write using the next value in the sequence, but only if the item doesn’t exist
response = client.put_item(
Item={
‘pk’: PROJECT_ID,
‘sk’ : highestIssueId + 1,
‘priority’ : ‘low’
},
ConditionExpression=’attribute_not_exists(pk)’
)
saved = True
# An exception indicates we lost a race condition, so increment the value and loop again
except dynamo.meta.client.exceptions.ConditionalCheckFailedException as e:
highestIssueId = highestIssueId + 1

The cost of this approach is 0.5 read units to query for the highest value used so far, plus the usual write costs to store the new item. You will be charged for any attempts to write that are rejected because the condition has identified the item already exists, so the cost of this approach increases with contention and retries. If you anticipate contention, you can turn the read into a strongly consistent read, which costs 1.0 read units for the query but always reads the latest value.

The maximum throughput for this approach is the maximum throughput for a partition for each item collection (assuming your items are below 1 KB in size). This throughput includes any attempts to write that are rejected because of the condition.

This approach will not have gaps in the count even in the event of an unexpected client stopping or temporary networking issue.

Conclusion

For situations where you can’t use a randomly generated ID value, you can use the techniques presented in this post to generate an ever-increasing sequence value in your DynamoDB tables.

If you enjoyed reading this post, check out Implement Resource Counters with DynamoDB or Scaling DynamoDB: How partitions, hot keys, and split for heat impact performance.

To learn how to get started with DynamoDB, please see our developer guide.

About the Authors

Chris Gillespie is a UK-based Senior Solutions Architect. He spends most of his work time with fast-moving “born in the cloud” customers. Outside of work, he fills his time with family and trying to get fit.

Jason Hunter is a California-based Principal Solutions Architect specializing in Amazon DynamoDB. He’s been working with NoSQL databases since 2003. He’s known for his contributions to Java, open source, and XML.

Read MoreAWS Database Blog

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments