Time to Live (TTL) is a widely used feature in Apache Cassandra. TTL helps developers manage storage costs and simplify application logic by expiring data automatically at a specified time. For example, you can use TTL in time series workloads to remove older data automatically to save on storage costs. You also can use TTL to simplify automation for use cases such as expiring advertisements on an ad platform after a given amount of time.
However, managing TTL in Cassandra can be complex. For example, whenever a row is deleted or updated by TTL in Cassandra, Cassandra creates a record of that operation called a tombstone. As a developer, you’re responsible for cleaning up these tombstone records periodically by using a process called compaction. Unfortunately, the compaction process competes for the same compute and I/O resources that your application uses to serve table traffic. As a result, this can decrease performance over time and introduce availability risks for applications, such as out of memory exceptions due to increased heap pressure. Developers need to have in-depth knowledge of how the compaction process works to avoid issues such as zombie data (deleted data that later reappears due to synchronization issues), and need to tune their compaction strategy to ensure that the compaction process doesn’t create availability issues for their cluster. On the other hand, if compaction isn’t run frequently enough, it can lead to tombstone proliferation and result in increased storage costs and slower query performance. In extreme cases, so many tombstone records get created that Cassandra can’t run the compaction process successfully.
In this post, we explain how we use the serverless nature of Amazon Keyspaces (for Apache Cassandra) to address the challenges of using TTL in Cassandra workloads by offering a fully managed version of TTL that doesn’t impact application performance or introduce availability risks for applications.
Introducing Amazon Keyspaces TTL
Amazon Keyspaces now helps developers use TTL in their applications more easily and safely by using a scalable, highly available, and fully managed database service. With Amazon Keyspaces TTL, developers don’t have to worry about tombstones or low-level system operations such as compaction. Data expires automatically at the time you specify. After data expires, it’s no longer returned in queries. Because Amazon Keyspaces is serverless, Amazon Keyspaces TTL doesn’t compete with your application for a fixed amount of table resources. As a result, Amazon Keyspaces can clean up expired data from your storage automatically (typically within 10 days) without impacting the performance or availability of your tables. You don’t have to size your tables to account for the cleanup process, or configure how the cleanup process works. Keyspaces manages the cleanup process and all the underlying resources automatically. Note that expired data continues to count towards your metered storage and row size quotas until Amazon Keyspaces removes it from storage.
Using Amazon Keyspaces TTL
You can use Amazon Keyspaces TTL with the same Cassandra Query Language (CQL) application code that you use today. For example, to create a table with a default TTL setting of 1 year, you use the default_time_to_live option in your CREATE TABLE CQL statement:
You also can use the USING TTL operator to set TTL values on individual rows or columns in insert and update operations:
If a table doesn’t have a default TTL setting, you must first enable TTL on that table before you can set TTL values for individual rows or columns in insert or update operations. You can enable TTL on a table by using the ttl custom property, as shown in the following code. After TTL has been enabled on a table, it can’t be disabled.
Finally, you can query for the TTL value of a column by using the TTL function:
Amazon Keyspaces TTL pricing is based on the size of the rows that you delete or update. TTL operations are metered in TTLDeletes. Each TTLDelete provides enough capacity to delete or update up to 1 KB of data per row. For example, to delete a 2.5KB row using TTL requires 3 TTLDeletes. To update a 3.5 KB row to delete a subset of columns within the row requires 4 TTLDeletes.
In this post, we introduced Amazon Keyspaces TTL and how it helps you manage your storage costs and simplify your application code for use cases that require deleting data automatically at specified times. We also shared how Amazon Keyspaces TTL helps address the management challenges of using TTL in Cassandra workloads.
If you have any questions, comments, or suggestions, please leave a comment below. You can also visit the AWS Forums for Amazon Keyspaces.
Keyspaces TTL is generally available in all AWS Regions where Keyspaces is offered. To learn more about Amazon Keyspaces TTL, see Expiring data by using Amazon Keyspaces Time to Live (TTL) in the Amazon Keyspaces Developer Guide.
About the Author
Read MoreAWS Database Blog