Best practices for migrating auto-incrementing keys to Spanner

By mullaned2002

January 23, 2024

143

Editor’s note: In the previous post in this serieswe introduced primary key default values in Spanner. We also showed how to use the UUIDs and integer SEQUENCEs to automatically generate keys in the database. In this post we’ll show how to use these new capabilities to migrate schemas and data from other databases to Spanner, minimizing changes to downstream applications and ensuring Spanner best practices.

Spanner is a distributed relational database that is designed for the highest levels of availability and consistency at any scale. It allows users to seamlessly scale resources up and down to optimize costs based on their real-time needs, while maintaining continuous operation. Customers in gaming, retail, financial services, and many other industries rely on Spanner for their most demanding workloads.

Migrating to Spanner

Many of these workloads did not start on Spanner, though. Customers come to Spanner from different relational and non-relational databases, looking to take advantage of Spanner’s seamless scaling and fully managed experience. Spanner provides a set of tools and best practices to facilitate migrations. The Spanner Migration Tools include assessment, schema translation, and data movement for terabyte-sized databases coming from MySQL and PostgreSQL. For broader migration guidance, you can refer to the Spanner documentation.

In this post we’ll focus specifically on migrating databases that use auto-generated keys, in particular, auto-incrementing sequential integers and UUIDs.

Each of the migration strategies below addresses the key requirements:

Ensure the fidelity and correctness of the migrated keysMinimize downstream application changes, such as changing types or values of the keys themselvesSupport replication scenarios where either the source or target database generates the keys and data is synchronized between them, for example, to do a live cutover between systemsImplement Spanner best practices for performance and scalability

Migrating sequential keys

We’ll start with the most common scenario for relational workloads coming to Spanner: migrating from a single-instance database that uses sequential monotonic keys, for example AUTO_INCREMENT in MySQL, SERIAL in PostgreSQL, or the standard IDENTITY type in SQL Server or Oracle. For databases that manage writes on a single machine, a counter to provide sequential keys is simple. However, ordered keys can cause performance hotspots in a distributed system like Spanner.

At a high level, the strategy to migrate sequential keys to Spanner is:

Define a copy of the table in Spanner using an integer primary key, just as in the source database.Create a sequence in Spanner and set the table’s primary key to use it for its default value.Load the data with its keys as-is from the source database into Spanner, for example using the Spanner Migration Tool or the lower-level Dataflow templates.Optionally set foreign key constraints for any dependent tables.Before inserting new data, configure the Spanner sequence to skip values in the range of the existing keys.Insert new data, as before, allowing the sequence to generate keys by default.

Let’s start by defining the table and related sequence. In Spanner you define a new SEQUENCE object and set it as the default primary value of the destination table, for example using the GoogleSQL dialect:

code_block<ListValue: [StructValue([(‘code’, “CREATE SEQUENCE singer_id_sequence OPTIONS (rn sequence_kind = ‘bit_reversed_positive’rn);rnrnCREATE TABLE singers (rn singer_id INT64 DEFAULTrn (GET_NEXT_SEQUENCE_VALUE(SEQUENCE singer_id_sequence)),rn name STRING(1024),rn biography STRING(MAX),rn) PRIMARY KEY (singer_id);rnrnCREATE TABLE albums (rn album_id INT64,rn singer_id INT64rn album_nam STRING(1024),rn song_list STRING(MAX),rn CONSTRAINT FK_singer_albumrn FOREIGN KEY (album_id) REFERENCES singers (singer_id)rn) PRIMARY KEY (album_id);”), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3efb71519df0>)])]>

The requiredbit_reversed_positiveoption indicates that the numbers generated by the sequence will be greater than zero, but not ordered (see theintroductory postfor more information on bit-reversed sequences). Generated values are of type INT64.

As you migrate existing rows from your source database to Spanner, the rows’ keys remain unchanged. For new inserts that don’t specify a primary key, Spanner automatically calls theGET_NEXT_SEQUENCE_VALUE()function to retrieve a new number. Since these values distribute uniformly across the range[1, 263],there could be collisions with the existing keys. If this occurred your insert would fail with a “key already exists” error. To prevent this, you can configure the sequence toskipthe range of values covered by the existing keys.

For example, assuming that the tablesingerswas migrated from PostgreSQL, where its key,singer_id, was inSERIALtype:

code_block<ListValue: [StructValue([(‘code’, ‘CREATE TABLE singers (rn singer_id SERIAL PRIMARY KEY,rn name varchar(1024),rn biography varcharrn);’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3efb715193a0>)])]>

The column values are monotonically increasing. After migration, we retrieve the maximum value of thesinger_id:

code_block<ListValue: [StructValue([(‘code’, ‘SELECT MAX(singer_id) FROM singers;’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3efb715198b0>)])]>

Assuming the returned value is 20,000, you configure the sequence in Spanner to skip the range[1, 21000]. The extra 1,000 serves as a buffer to accommodate writes to the source database after the initial bulk migration. These values would typically be replicated later and we want to ensure they also will not conflict.

code_block<ListValue: [StructValue([(‘code’, ‘ALTER SEQUENCE singer_id_sequence SET OPTIONS (rn skip_range_min = 1,rn skip_range_max = 21000rn);’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3efb715192e0>)])]>

The diagram below illustrates a few migrated rows along with new rows inserted in Spanner after migration:

Now new keys generated in Spanner are guaranteed to not conflict with the range of keys generated in the source PostgreSQL database.

Multi-database usage

You can take this skipped range concept one step further to support scenarios where either Spanner or the upstream database generates primary keys, for example to enable replication in either direction for disaster recovery during a migration cutover.

To support this, you can configure each database to have a non-overlapping key value range. When you define a range for the other database, you can tell the Spanner sequence to skip over that range with the skipped range syntax.

For example, after the bulk migration of our music tracking application, we’ll replicate data from PostgreSQL to Spanner to reduce the amount of time it takes to cut over. Once we’ve updated and tested the application on Spanner, we’ll cut over from PostgreSQL to Spanner, making it the system of record for updates and new primary keys. When we do, we‘ll reverse the flow of data between databases and replicate data back to the PostgreSQL instance just in case we need to revert if there’s a problem.

In this scenario, sinceSERIALkeys in PostgreSQL are 32-bit signed integers, while our keys in Spanner are larger 64-bit numbers, we will do the following steps:

1. In PostgreSQL, alter the key column to be a 64-bit column, orbigint,

code_block<ListValue: [StructValue([(‘code’, ‘ALTER TABLE singers ALTER COLUMN singer_id TYPE bigint;’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3efb71519a90>)])]>

2. Since the sequencesingers_singer_id_seqused bysinger_idis still of typeint, its maximum value is already 231-1. To be safe, we can optionally set a CHECK constraint to the table in the source PostgreSQL database to ensure thatsinger_idvalues are always smaller or equal to 231-1:

code_block<ListValue: [StructValue([(‘code’, ‘ALTER TABLE singers ADD CHECK (singer_id <= 2147483647);’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3efb71519a30>)])]>

3. In Spanner, we’ll alter the sequence to skip the range [1, 231-1].

code_block<ListValue: [StructValue([(‘code’, ‘ALTER SEQUENCE singer_id_sequence SET OPTIONS (rn skip_range_min = 1,rn skip_range_max = 2147483647 — 2^31-1rn);’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3efb71519d60>)])]>

4. Deploy and test your usage, including from PostgreSQL to Spanner and vice versa.

Using this technique, PostgreSQL will always generate keys in the 32-bit integer space, while Spanner’s keys are restricted to the 64-bit integer space, larger than all of the 32-bit numbers and wide enough for future growth. This ensures that both systems can independently generate keys that are guaranteed not to conflict.

Migrating UUIDs

UUID primary keys are generally easier to migrate than sequential integer keys.UUIDs, v4 in particular, are effectively unique regardless of where they are generated. (The math behind this is an interesting application of thebirthday problemin statistics.) As a result, UUID keys generated elsewhere will integrate easily with new UUID keys generated in Spanner and vice versa.

The high-level strategy for migrating UUID keys is as follows:

Define your UUID keys in Spanner using string columns with a default expression,GENERATE_UUID()orspanner.generate_uuid()in the PostgreSQL dialect.

Export data from the source system, serializing the UUID keys as strings.

Import the keys into Spanner as-is.

Optionally enable foreign keys.

In Spanner, you define a UUID primary key column as aSTRINGorTEXTtype, and assignGENERATE_UUID()as its default value. During migration, you bring all values of existing rows from the source database to Spanner, including key values. (See thismigration guidefor more details.) After migration, as new rows are inserted, Spanner callsGENERATE_UUID()to generate new UUID values for them. For example, the primary keyFanClubIdwill get a UUIDv4 value when we insert a new row into the table,FanClubs:

code_block<ListValue: [StructValue([(‘code’, ‘CREATE TABLE fan_clubs (rn fan_club_id STRING(36) DEFAULT (GENERATE_UUID()),rn club_name STRING(1024),rn) PRIMARY KEY (fan_club_id);’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3efb715199d0>)])]>

code_block<ListValue: [StructValue([(‘code’, ‘INSERT INTO fan_clubs (club_name) VALUES (“SwiftFanClub”);’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3efb71519820>)])]>

Migrating your own primary keys

Bit-reversed sequences and UUIDs provide unique values that won’t hotspot at scale when used as a primary key in Spanner. But they don’t provide any guarantees on the ordering of their values… by design! However, some applications rely on the order of the keys to determine recency or to sequence newly created data. Databases manually sharded for scale typically rely on a global counter, coordinated outside of any independent database instances.

To use ordered keys generated externally in Spanner you create acomposite keythat combines a uniformly distributed value, such as a shard ID or a hash, as the first component and a sequential number as the second component. This preserves the ordered key values, but won’t hotspot at scale.

In this example, we are migrating a MySQL table with anAUTO_INCREMENTprimary key,students, to Spanner. The downstream application generates student IDs, and the IDs are shared to end users (students, faculty, etc.)

code_block<ListValue: [StructValue([(‘code’, ‘// This is the table to be migrated from MySQLrnCREATE TABLE students (rn student_id INT NOT NULL AUTO_INCREMENT,rn info VARCHAR(2048),rn PRIMARY KEY (student_id)rn);’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3efb71519460>)])]>

In Spanner, we add a generated column containing a hash of theStudentIdcolumn:

code_block<ListValue: [StructValue([(‘code’, ‘CREATE TABLE student (rn student_id_hash INT64 AS (FARM_FINGERPRINT(student_id)) STORED,rn student_id INT64 NOT NULL,rn info STRING(2048),rn) PRIMARY KEY (student_id_hash, student_id);’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3efb71519af0>)])]>

Get started today

We recently introduced new capabilities that help users implement best practices for primary keys in Spanner using the SQL concepts they already know. The strategies detailed above minimize downstream application changes and maximize performance and availability in Spanner when migrating auto-incrementing and UUID from other relational databases.

You can learn more about what makes Spanner unique and how it’s being used today. Or try it yourself for free for 90-days or for as little as $65 USD/month for a production-ready instance that grows with your business without downtime or disruptive rearchitecture.

Cloud BlogRead More

Previous articleReal-time data processing for machine learning with Striim and BigQuery

Next articleOracle has a better generative AI strategy, analysts say

Best practices for migrating auto-incrementing keys to Spanner

Migrating to Spanner

Migrating sequential keys

Multi-database usage

Migrating UUIDs

Migrating your own primary keys

Get started today

Leverage enterprise data with Denodo and Vertex AI for generative AI applications

TypeScript takes aim at truthy and nullish bugs

Hex-LLM: High-efficiency large language model serving on TPUs in Vertex AI Model Garden

LEAVE A REPLY Cancel reply

Most Popular

Schneider Electric automates Salesforce account hierarchy management with generative artificial intelligence (AI) using Amazon Aurora and Amazon Bedrock

Leverage enterprise data with Denodo and Vertex AI for generative AI applications

TypeScript takes aim at truthy and nullish bugs

Make relevant movie recommendations using Amazon Neptune, Amazon Neptune Machine Learning, and Amazon OpenSearch Service

Recent Comments

EDITOR PICKS

Exploring the Click Element Variable in Google Tag Manager

How to track events with Google Tag Manager and Google Analytics

Data Layer Variable in GTM: What, Why, and Where?

POPULAR POSTS

Meet the inaugural class of the Google for Startups Accelerator: Climate Change in Europe

Releasing Artifact Registry assets across Organizations and Projects with serverless

How to choose a cloud-based CI/CD platform

POPULAR CATEGORY