Sunday, April 28, 2024
No menu items!
HomeDatabase ManagementRate limited bulk operations in DynamoDB Shell

Rate limited bulk operations in DynamoDB Shell

DynamoDB Shell is (ddbsh) an open-source command line interface for Amazon DynamoDB. For a simple introduction, refer to Query data with DynamoDB Shell – a command line interface for Amazon DynamoDB. The ddbsh README.md file has detailed command and usage examples. One of the objectives of ddbsh is to provide a simple and intuitive environment for newcomers to DynamoDB that allows them to get started by running familiar SQL-like commands.

DynamoDB provides APIs to update, delete, and replace items, but all of these operate on exactly one item. However, you may need to perform data maintenance on numerous items. To do so, you have to write bespoke applications utilizing the DynamoDB API, fetch the items to be modified, and then perform the modification one at a time. In addition, you also have to implement code that ensures that these operations don’t impact the foreground application’s use of the same tables. DynamoDB Shell provides a simple way to do this. It supports SQL-like constructs that can operate on many or all items in a table, and perform UPDATE, DELETE, or REPLACE operations in a controlled manner.

Note that ddbsh is provided for your use on an as is basis, is not supported for production use cases. It should only be used for non-production and experimental use cases. Refer to section 7 of the License for more details.

ddbsh can operate directly against your DynamoDB tables, therefore deletes and drops will impact your tables and the operations are irreversible. ddbsh can perform scans and queries against your data and the operations you perform count against your table capacity, and could incur significant costs. In this post, we show you how to rate limit bulk operations in ddbsh.

For details on how to restrict access to specific APIs, refer to Fine Grained Access Control in DynamoDB.

It is strongly advised that you understand what ddbsh is doing, and experiment with DynamoDB Local first.

Examples

Let’s take a look at some simple examples. For this, we use a table with a Global Secondary Index (GSI). Recall that DynamoDB Shell provides DDL support and the table created in the following example will be set to on-demand billing mode (the default because none is specified):

us-east-1> create table bulkops ( a number, b number, c number ) primary key ( a hash ) gsi (bulkgsi on (b hash, c range) projecting all );

We populate it with 2,000 items, each of which looks something like the following code:

insert into bulkops ( a, b, c, d ) values
( 0, 1, 2, [… long string …] ),
[…]
( 1999, 2000, 2001, [… long string …] );

Each long string is 5,000 characters long. Reading all the data from this table ends up consuming 1227.5 Read Capacity Units (RCUs):

us-east-1> select * from bulkops return total;
[… 2000 rows of data …]
SELECT (bulkops, 1227.5, 0, 0)
us-east-1>

Even if we only fetched the three numbers (a, b, c), it consumes the same number of RCUs because the scan has to read the entire item. Also, this entire SELECT operation took just over 1 second:

$ /usr/bin/time ddbsh -c ‘select a, b, c from bulkops return total’ | tail -n 5
1.18 real 0.06 user 0.02 sys
{a: 1076, b: 1077, c: 1078}
{a: 1209, b: 1210, c: 1211}
{a: 641, b: 642, c: 643}
{a: 735, b: 736, c: 737}
SELECT (bulkops, 1227.5, 0, 0)
$

Introduction to rate limiting

Suppose we want to ensure that this command never consumed more than 50 RCUs in a second. To do that, we add a ratelimit to the SELECT operation:

$ /usr/bin/time ddbsh -c ‘select a, b, c from bulkops with ratelimit (50 rcu)’ | tail -n 5
23.32 real 0.07 user 0.02 sys
{a: 99, b: 100, c: 101}
{a: 1076, b: 1077, c: 1078}
{a: 1209, b: 1210, c: 1211}
{a: 641, b: 642, c: 643}
{a: 735, b: 736, c: 737}
$

The command now took 23 seconds to complete!

The following are similar commands that use the -q (quiet) command line option, and pipe the results to a line counter (wc -l). The first one has no rate limit (takes 1.05 seconds), the second has a 50 RCU limit (takes 23.39 seconds), and the last one has a 10 RCU limit (takes 115.33 seconds). All of them produce 2,000 rows of output.

$ /usr/bin/time ddbsh -q -c ‘select a, b, c from bulkops’ | wc -l
1.17 real 0.07 user 0.02 sys
2000
$ /usr/bin/time ddbsh -q -c ‘select a, b, c from bulkops with ratelimit (50 rcu)’ | wc -l
23.39 real 0.06 user 0.02 sys
2000
$ /usr/bin/time ddbsh -q -c ‘select a, b, c from bulkops with ratelimit (10 rcu)’ | wc -l
115.33 real 0.06 user 0.02 sys
2000
$

How rate limiting works

Rate limiting in DynamoDB Shell is implemented using a simple token bucket algorithm. A token bucket accumulates tokens at a pre-determined rate. An operation is only allowed to be performed when there are a positive number of tokens in the bucket. When the operation is complete, the resources it consumed are computed and the appropriate number of tokens are removed from the bucket (and the number of remaining tokens is allowed to go negative). The number of tokens will never be allowed to go over 1 second’s worth. For the complete implementation of the token bucket, refer to the GitHub repo.

For each command, DynamoDB Shell implements two token buckets. One is used for read tokens and one is used for write tokens. Therefore, you are able to do a rate limited update as follows:

$ /usr/bin/time ddbsh -d -q -c “update bulkops set updated = true with ratelimit ( 10 rcu, 30 wcu )”
UPDATE (2000 read, 2000 modified, 0 ccf)
666.66 real 4.03 user 1.77 sys
$

The write took over 11 minutes and updated all 2,000 items. The same update without rate limiting takes less than 5 seconds:

$ /usr/bin/time ddbsh -d -q -c “update bulkops set updated = true”
UPDATE (2000 read, 2000 modified, 0 ccf)
4.85 real 1.20 user 0.51 sys
$

Queries can specify either read limits, write limits, or both limits. The syntax for the rate limit clause is as follows:

ratelimit := WITH RATELIMIT ( RR RCU, WW WCU ) |
WITH RATELIMIT ( RR RCU ) |
WITH RATELIMIT ( WW WCU )

Updating with indexes

Suppose we want to update the table bulkops and set all items where b = 30 as follows:

UPDATE bulkops
SET found = true
WHERE b = 30;

This query would need to perform a scan of the table. But there’s a way to make this easier because we have a GSI. This is an extension to SQL—you can specify the index as the update target. In reality, this will only update the table, but it will use the index to find items to update, as shown in the following code. The update on bulkops.bulkgsi allows DynamoDB Shell to perform a query against the index and use the value it found there to perform the single update.

us-east-1> explain update bulkops.bulkgsi set found = true where b = 30;
Query({
“TableName”: “bulkops”,
“IndexName”: “bulkgsi”,
“ConsistentRead”: false,
“ReturnConsumedCapacity”: “NONE”,
“ProjectionExpression”: “#afaa1”,
“KeyConditionExpression”: “#afaa2 = :vfaa1”,
“ExpressionAttributeNames”: {
“#afaa1”: “a”,
“#afaa2”: “b”
},
“ExpressionAttributeValues”: {
“:vfaa1”: {
“N”: “30”
}
}
})
UpdateItem({
“TableName”: “bulkops”,
“Key”: {
“a”: {
“N”: “29”
}
},
“UpdateExpression”: “SET #aeaa1 = :veaa1”,
“ConditionExpression”: “attribute_exists(#aeaa2) AND #aeaa3 = :veaa2”,
“ExpressionAttributeNames”: {
“#aeaa1”: “found”,
“#aeaa2”: “a”,
“#aeaa3”: “b”
},
“ExpressionAttributeValues”: {
“:veaa1”: {
“BOOL”: true
},
“:veaa2”: {
“N”: “30”
}
}
})
us-east-1> update bulkops.bulkgsi set found = true where b = 30;
UPDATE (1 read, 1 modified, 0 ccf)
us-east-1> select a, b, c, found from bulkops.bulkgsi where b = 30;
{a: 29, b: 30, c: 31, found: TRUE}
us-east-1>

Observe that the query against the index projects the key attribute of the table and that is used in the UpdateItem call that follows.

Remember to delete the table when you are done.

us-east-1> drop table bulkops;
DROP
us-east-1>

Conclusion

DynamoDB Shell provides some SQL-like constructs and extensions that allow you to perform bulk UPDATE, DELETE, and REPLACE operations with rate limiting. This rate limiting makes sure that the operations don’t consume more than a certain number of RCUs and WCUs. This can be useful when you want to perform these operations without impacting other traffic that may be going to these tables.

If you have questions about DynamoDB Shell, or suggestions for improvements, please contact us or provide feedback. If you would like to learn more about cost-effective bulk processing with DynamoDB refer to this blog post.

About the author

Amrith Kumar is a Senior Principal Engineer in Amazon Web Services and works on Amazon DynamoDB.

Read MoreAWS Database Blog

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments