Thursday, December 12, 2024
No menu items!
HomeDatabase ManagementMigrate time series data to Amazon Timestream for LiveAnalytics using AWS DMS

Migrate time series data to Amazon Timestream for LiveAnalytics using AWS DMS

We are excited to announce Amazon Timestream for LiveAnalytics as a newly supported target endpoint for AWS Database Migration Service (AWS DMS). This addition allows you to move time-series data from an AWS DMS supported source database to Timestream.

Timestream is a fully managed, scalable, and serverless time series database service that makes it straightforward to store and analyze trillions of events per day. Customers across a broad range of industry verticals have adopted Timestream to derive real-time insights, monitor critical business applications, and analyze millions of real-time events across websites and applications. With the migration capabilities of AWS DMS, you can now migrate your existing time series data and replicate ongoing time series data to Timestream with reduced downtime. To take advantage of the service’s high throughput ingestion capabilities, Timestream also supports AWS DMS Parallel Load and Parallel Apply features so you can migrate a large volume of data in parallel, allowing for significantly faster migrations.

In this post, we show you how to use Timestream as a target for an example PostgreSQL source endpoint in AWS DMS.

Solution overview

When creating your migration, the source database is where your data currently resides, and the target database is where it will be transferred to. In this post, we migrate data from a PostgreSQL source database to a Timestream target database. The replication instance is the component that runs the migration task, and it needs access to both the source and target within your VPC. This post guides you through setting up your Timestream database and assigning the required AWS Identity and Access Management (IAM) permissions for AWS DMS to migrate your data. When the setup is complete, you can run your task and verify that the data is migrated to your Timestream database.

Note that AWS DMS Timestream endpoint only supports RDBMS sources.

Prerequisites

You should have a basic understanding of how AWS DMS works. If you’re just getting started with AWS DMS, review the AWS DMS documentation. You should also have a supported AWS DMS source endpoint and a Timestream target to perform the migration.

Set up IAM resources for Timestream

It is straightforward to set up a Timestream database as an AWS DMS target and start migrating data. To get started, you first need to create an IAM role with the necessary privileges.

  1. On the IAM console, choose Policies in the navigation pane.
  2. Choose Create policy.
  3. Use the following code to create your IAM policy. Update the AWS Region, account ID, and database name accordingly. For this post, we name the policy dms-timestream-access.
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "AllowDescribeEndpoints",
                "Effect": "Allow",
                "Action": [
                    "timestream:DescribeEndpoints"
                ],
                "Resource": "*"
            },
            {
                "Sid": "VisualEditor0",
                "Effect": "Allow",
                "Action": [
                    "timestream:ListTables",
                    "timestream:DescribeDatabase"
                ],
                "Resource": "arn:aws:timestream:{region}:{account_id}:database/{database_name}"
            },
            {
                "Sid": "VisualEditor1",
                "Effect": "Allow",
                "Action": [
                    "timestream:WriteRecords",
                    "timestream:UpdateTable",
                    "timestream:CreateTable"
                ],
                "Resource": "arn:aws:timestream:{region}:{account_id}:database/{database_name}/*"
            }
        ]
    }

    This policy encompasses three key permissions for Timestream:

    • Describe endpoints permission – Grants the ability to describe Timestream endpoints. This is crucial for using the DescribeDatabase function, which is used in conjunction with the other Timestream access permissions.
    • Database description and table listing access – Allows for the description of a specific database to verify its existence and accessibility to AWS DMS. It also permits listing tables within that database, which is important for identifying existing tables and creating new ones as necessary.
    • Data insertion and table management capabilities – Essential for enabling AWS DMS to insert data into Timestream and for updating tables, especially when changes are needed in the memory or magnetic store durations.

    Now you can attach your IAM policy to the role that you’ll use.

  4. On the IAM console, choose Roles in the navigation pane.
  5. Choose Create role.
  6. For Trusted entity type, select AWS service.
  7. For Service or use case, choose DMS.
  8. Choose Next.
  9. Give the role a name (for this post, we name it dms-timestream-role).
  10. Attach the policy you created.
  11. Create the role.

Create a Timestream database

To create a Timestream database, complete the following steps:

  1. On the Timestream console, choose Databases in the navigation pane.
  2. Choose Create database.
  3. For Choose a configuration, select Standard database.
  4. For Name, enter a name for your database.
  5. For KMS key, choose a key ID.
  6. Choose Create database.

Create a target Timestream endpoint

To set up the Timestream endpoint with AWS DMS, complete the following steps:

  1. On the AWS DMS console, choose Endpoints in the navigation pane.
  2. Choose Create endpoint.
  3. For Endpoint type, select target endpoint.
  4. For Endpoint identifier, enter a name for your endpoint (for this post, we use timestream-target)
  5. For Target engine, choose Amazon Timestream.
  6. For Amazon Resource Name (ARN) for service access role, enter the role ARN for the role you created earlier (dms-timestream-role).
  7. For Memory store retention, set the retention period according to your use case.
    The memory store retention period in AWS DMS is calculated in hours with a supported range of 1–8,736 hours. The memory store is optimized for high-speed ingestion. As a best practice, set the memory store retention period to accommodate the majority of the data you’re migrating.
  8. For Magnetic store retention, set the duration according to your use case.
    The magnetic store retention period in AWS DMS is measured in days, with a supported range of 1–73,000 days. The magnetic store is read-only by default, but you can enable write capability by adjusting the endpoint settings and setting EnableMagneticStoreWrites to true. After you make this change, data outside of the memory store’s retention period will automatically be transferred to the magnetic store for longer-term storage.The magnetic store is designed for updating historical data and does not support large volume ingestion. If your intention is to load historical data into the magnetic store, it’s recommended to use AWS DMS to migrate to Amazon Simple Storage Service (Amazon S3) in CSV format and then use batch load in Timestream. Records older than the magnetic store retention period will be automatically deleted, so you should carefully consider your retention period setting.
  9. Optionally, you can configure CdcInsertsAndUpdates and EnableMagneticStoreWrites under Endpoint settings.
    CdcInsertsAndUpdates is a Boolean field. By default, it is false. When true, it will skip deletes from the source database during change data capture (CDC). When false, it will apply deletes from the source database as updates in Timestream. The reason why this might be a desired behavior is that, as of this writing, Timestream doesn’t support deleting records. Instead, AWS DMS will null out the records by adding in 0 values for number types, NULL strings for VARCHAR, and False for bool values. This is so that you can tell that a record no longer contains data. If you would like to leave the original record before the delete, set this field to true.EnableMagneticStoreWrites is a Boolean field as well. By default, it is false. When true, it will allow records outside of the memory store retention period, but within the magnetic store retention period, to be written directly to the magnetic store. This is not recommended for most use cases because large inputs will cause throttling. This is intended for a case where some records are outside of the memory store retention period and you don’t want to skip those records.
  10. Choose Create endpoint to create your target endpoint.

Create a migration task

To create a migration task with your Timestream target endpoint, complete the following steps:

  1. On the AWS DMS console, choose Database migration tasks in the navigation pane.
  2. Choose Create database migration task.
  3. Specify the replication instance and source database endpoint that you’ll use with the Timestream target.
  4. Choose the Timestream endpoint you created as the target.
  5. Choose the migration task depending on the type of migration you want to complete.
  6. In the Task settings section, select JSON editor and adjust the settings accordingly.

    The code contains the following key parameters:
    • ParallelLoadThreads – Determines the number of threads AWS DMS uses for initially loading each table into a Timestream target, with a maximum limit of 32 threads
    • ParallelLoadBufferSize – Sets the maximum number of records for the buffer used by parallel load threads for a Timestream target, ranging from a default of 50 to a maximum of 1,000
    • ParallelLoadQueuesPerThread – Defines the number of queues each thread accesses to process data records for batch loads to the target, with a default setting of 1 and a range of 5–512 for Timestream targets
    • ParallelApplyThreads – Specifies the number of concurrent threads used by AWS DMS to push data to a Timestream target during a CDC load, with values ranging from 0–32
    • ParallelApplyBufferSize – Indicates the maximum number of records each buffer queue can hold, used by concurrent threads to push data to a Timestream target during a CDC load, with a default of 100 and a maximum of 1,000
    • ParallelApplyQueuesPerThread – Specifies the number of queues per thread for extracting data records for batch loads to a Timestream endpoint during CDC, with a range from 1–512

    You can use these fields to enable faster migrations by batching record writes:

    • We recommend using a ParallelLoadQueuesPerThread and ParallelApplyQueuesPerThread size of 100 because this is the maximum records per write to Timestream, so you get the best throughput this way
    • You should adjust the thread count depending on your throughput
    • You should scale queues per thread with your load, but this should be relatively low, around 50, because it doesn’t take up too much instance memory

    The most important part when configuring an AWS DMS migration task with a Timestream target is how your source tables will map to Timestream tables. Timestream has a unique mapping rule when being used as a target. Let’s explore how this works with some example data.

    Say we have two tables. One table is called sensor_data and the other is called sensors. The following screenshot shows example data in sensor_data.

    The SQL command used to create this table is the following:

    CREATE TABLE sensor_data (
        serial SERIAL PRIMARY KEY,
        sensor_time TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
        temperature NUMERIC,
        humidity NUMERIC,
        light NUMERIC,
        co2 NUMERIC,
        humidityratio BOOLEAN
    );

    We populate the table with sample data using the following command:

    INSERT INTO sensor_data (sensor_time, temperature, humidity, light, co2, humidityratio)
    SELECT
      NOW() - (random() * (20 * INTERVAL '1 day')),
      (random() * (30 - 15) + 15)::NUMERIC(5,2),
      (random() * (90 - 10) + 10)::NUMERIC(5,2),
      (random() * (1000 - 10) + 10)::NUMERIC,
      (random() * (2000 - 300) + 300)::NUMERIC,
      (random() < 0.5)::BOOLEAN
    FROM generate_series(1, 10000);

    For sensors, the data will not be important; the significance is to show how table mappings will impact Timestream migrations.

  7. In this example, configure table mappings in the following way:
    {
        "rules": [
            {
                "rule-type": "selection",
                "rule-id": "1",
                "rule-name": "1",
                "object-locator": {
                    "schema-name": "{your_schema}",
                    "table-name": "sensor_data"
                },
                "rule-action": "include"
            },
            {
                "rule-type": "object-mapping",
                "rule-id": "2",
                "rule-name": "timestream-map",
                "rule-action": "map-record-to-record",
                "object-locator": {
                    "schema-name": "{your_schema}",
                    "table-name": "sensor_data"
                },
                "target-table-name": "sensor_data_timestream",
                "mapping-parameters": {
                    "timestream-dimensions": [
                        "serial"
                    ],
                    "timestream-timestamp-name":"sensor_time"
                }
            }
        ]
    }

    In the preceding code, you’re mapping the sensor_data table only. You specify timestream_dimension to be serial in the sample data, and set sensor_time as the timestamp name. timestream-dimensions and timestream-timestamp-name are required for each Timestream mapping rule.

    Additionally, only tables that have a mapping rule for Timestream will be migrated.

    There are two types of mapping available:

    • Single-measure mapping – In this case, all the columns that aren’t the timestamp or the dimensions (serial and sensor_time) will be mapped to their own record. For example, let’s consider the table sensor_data. For a single record in the source database, there would be five records in Timestream. Each would have the same serial value and sensor, but there would be one record for Temperature, another for Humidity, another for Light, and so on.
    • Multi-measure mapping – To use multi-measure mapping, you add a setting in the same section as "timestream-timestamp-name":"sensor_time", where you specify which column from the source you want to use as the measure name for the multi-measure record. In this case, let’s reuse serial: "timestream-multi-measure-name": "serial”. This means that the value of serial will be used as the measure name as well as the dimension in Timestream. The result in Timestream is that for each record in the source database, there is a single record in Timestream with a dimension serial, a timestamp sensor_time, and a measure name of the value of serial (in the first-row example, this would be 1). This record would then have measure values for each of the columns in the source database. When using multi-measure mapping, you should generally use "timestream-hash-measure-name": True so that you won’t meet the limit in Timestream of 8,192 unique measure names. For more information, refer to Recommendations for partitioning multi-measure records.

    For this post, we use single-measure mapping. Therefore, you set up sensor_data to be mapped, and sensors will be ignored, even if you were to add it in the first selection rule.

  8. Choose Create task to create your migration task with a Timestream target endpoint.

Run your migration task

To start your task, complete the following steps:

  1. On the AWS DMS console, choose Database migration tasks in the navigation pane.
  2. Select the task you created.
  3. On the Actions menu, choose Start.

You can monitor the Timestream database to view the migration activity.

Validate the migration

To verify your data is in Timestream, complete the following steps:

  1. On the Timestream console, under Management tools in the navigation pane, choose Query editor.
  2. For Choose a database to query, choose the database you’re migrating to.
  3. To see the number of records migrated, enter the following query:
    select count(*) from {your_db_name}."{your_table_name}"
  4. To see some sample data, enter the following query:
    select * from {your_db_name}."{your_table_name}” 

Assuming you have at least 10 rows migrated, the results should show you 10 rows of the records in your selected table, as shown in the following screenshot.

Error handling

The following are some common issues that you might face when migrating to Timestream:

  • When creating table mappings, make sure that you include timestream-dimensions and timestream-timestamp-name. Also, make sure that timestream-timestamp-name is an actual column in your source database table, is of type timestamp, and has the correct case, because it’s case-sensitive. See the following code:
    "mapping-parameters": {
        		"timestream-dimensions": [
            			"serial"
        		],
        		"timestream-timestamp-name":"sensor_time"
     }
  • Verify that your IAM role for the access policy has AWS DMS as a trusted relationship. To check this, navigate to your role on the IAM console and look under Trust relationships.
  • When using parallelLoad or parallelApply, your instance will use more RAM. If your task fails due to out of memory, you should scale your instance. For instance sizing recommendations, see Choosing the right AWS DMS replication instance for your migration.

Clean up

Now that the migration is complete, it’s time to clean up the AWS DMS, Timestream, and IAM resources:

  1. On the AWS DMS console, choose Database migration tasks in the navigation pane.
  2. Select the replication task you created and on the Actions menu, choose Delete.
  3. On the AWS DMS console, choose Endpoints in the navigation pane.
  4. Select the endpoint you created and on the Actions menu, choose Delete.
  5. On the AWS DMS console, choose Replication instances in the navigation pane.
  6. Select the replication instance you created and on the Actions menu, choose Delete.
  7. On the Timestream console, choose Databases in the navigation pane.
  8. Select the database you want to delete and choose Delete.
  9. On the IAM console, choose Roles in the navigation pane.
  10. Select the role you created and choose Delete.
  11. On the IAM console, choose Policies in the navigation pane.
  12. Select the policy you created and choose Delete.

After you finish these steps, all the resources you created for this post should be cleaned up!

Conclusion

In this post, we showed you how to set up an AWS DMS migration by creating a Timestream database, giving AWS DMS the necessary permissions to migrate to it, and creating the AWS DMS replication instance, endpoints, and task. You then ran that task and verified that the data was successfully migrated to your Timestream database. You can now run this task on your own workload. The final steps (create and run your migration task) are now universal among AWS DMS tasks. This enables you to run a full end-to-end migration from whichever relational database you would like and have the data appear in your Timestream database.

Let us know in the comments section what you are going to build with this new migration capability!


About the author

Matthew Carroll is a Software Development Engineer with the AWS DMS team at AWS He is dedicated to enhancing the versatility and scale of the tools used to migrate customer data across databases. He is the lead developer for the Amazon Timestream endpoint feature in AWS DMS.

Read MoreAWS Database Blog

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments