Create custom PostgreSQL data types using Trusted Language Extensions

By mullaned2002

September 7, 2023

244

PostgreSQL ships with many robust data types that accommodate most customer workloads in a performant manner. Although PostgreSQL has the capabilities to deploy custom data types natively, introducing new data types at scale in architectures spanning multiple AWS accounts and Regions poses a unique challenge for builders. With Trusted Language Extensions (TLE), you can create and manage your custom data types, allowing the quick and easy deployment of PostgreSQL data types across your infrastructures in a secure and efficient manner.

In this post, we demonstrate how to create custom PostgreSQL data types using TLE.

Trusted Language Extensions (pg_tle) Overview

PostgreSQL is known for its extensibility and support for a large breadth of database extensions. Trusted Language Extension for PostgreSQL (pg_tle) is an open-source framework that allows developers to build and package PostgreSQL extensions, including into Amazon Aurora PostgreSQL-Compatible Edition and Amazon Relational Database Service (Amazon RDS) for PostgreSQL. The TLE framework enhances these capabilities by providing a simple and safe way to add database extensions to Amazon Aurora and Amazon RDS, written in trusted languages in a way that can be easily repackaged and deployed as soon as they’re ready. TLE provides a controlled environment for running user-defined database extensions, allowing for quick and iterative expansion of your PostgreSQL database functionality.

PostgreSQL custom data types overview

Applications may demand specialized attributes, validation rules, or behaviors that can’t be accurately expressed with standard PostgreSQL data types. For example, if your application deals with geographic coordinates, sensor readings with complex units, or intricate financial calculations, creating custom data types can help maintain data integrity, enhance query efficiency, and improve overall code clarity.

Custom data types can also contribute to a more organized and semantically meaningful database design. When your data exhibits distinct characteristics that don’t align neatly with the general-purpose data types, custom types offer a way to encapsulate the complexity and provide a more intuitive interface for data manipulation. By creating custom data types, you not only streamline your application’s logic, but also establish a clear representation of your domain’s concepts within the database schema, making the code base more comprehensible for developers and maintainers alike.

Custom Datatypes data types and TLE in Amazon Aurora or Amazon RDS

Trusted Language Extension’s latest functionality allows you to create new data types, complete with their own rules and properties. Although PostgreSQL contains the capabilities to create custom data types, using TLE allows you to easily wrap custom data types as a TLE extension such that it can be easily redeployed across AWS accounts and Regions. The additional benefit of using TLE to create and manage custom data types lies in the ability to use trusted PostgreSQL languages other than SQL. In this post’s example, we create an unsigned int2 data type (written as a TLE using the PL/pgSQL trusted language). This functionality will be included in pg_tle v1.1.1, and is available on database instances in Amazon RDS running PostgreSQL 15.4 and higher, 14.9 and higher, and 13.12 and higher in all applicable AWS Regions, including the AWS GovCloud (US) Regions. Please see the related “What’s new” post for more information: Create Custom PostgreSQL Data types using Trusted Language Extensions

Solution overview

Although signed integer data types can support both positive and negative numbers, unsigned integer data types can only store positive numbers (or zero). Using unsigned integer data types can be advantageous, especially in vector workloads where it can be important to reduce storage space. In the following sections, we explore the requirements and show a basic example of how to create an unsigned int2 data type, which contains 16 bits per instance, allowing for values from 0–65,535 for PostgreSQL using TLE.

Prerequisites

You must have an Amazon Aurora or Amazon RDS instance with the pg_tle extension installed. For more information about setting up TLE, refer to New – Trusted Language Extensions for PostgreSQL on Amazon Aurora and Amazon RDS.

Create an unsigned int2 data type using TLE

Complete the following steps:

Connect to a database and check that “pg_tle” is installed. You do this with the following SQL command:

CREATE EXTENSION IF NOT EXISTS pg_tle;

On Amazon Aurora and Amazon RDS, you need to explicitly grant the pgtle_admin permission to your user. If you’re using the postgres user, you can do this with the following command:

GRANT pgtle_admin TO postgres;

Now you’re ready to create the uint2 data type. You can do this with the following code, which installs a TLE called pg_uint2:

SELECT pgtle.install_extension
(
‘pg_uint2’,
‘0.1’,
‘PostgreSQL unsigned int2’,
$_pg_tle_$
SELECT pgtle.create_shell_type(‘public’, ‘pg_uint2’);
CREATE FUNCTION public.pg_uint2_in(input text) RETURNS bytea AS
$$
DECLARE
result bytea;
BEGIN
result := input;
IF input::INT <= -1 OR input::INT >= 65536 THEN
RAISE EXCEPTION ‘Input value does not fit into this unsigned int2 datatype’;
END IF;
RETURN result;
END
;
$$ LANGUAGE plpgsql IMMUTABLE STRICT;
CREATE FUNCTION public.pg_uint2_out(input bytea) RETURNS text AS
$$
DECLARE
decoded TEXT;
BEGIN
decoded := encode(input, ‘escape’);
return decoded;
END
;
$$ LANGUAGE plpgsql IMMUTABLE STRICT;
SELECT pgtle.create_base_type(‘public’, ‘pg_uint2’, ‘pg_uint2_in(text)’::regprocedure, ‘pg_uint2_out(bytea)’::regprocedure, -1);
$_pg_tle_$
);

If the preceding commands are successful, you’ll see the following output:

install_extension
——————-
t
(1 row)

Now, enable your new TLE extension:

CREATE EXTENSION pg_uint2;

If successful, you can observe the following message:

CREATE EXTENSION

To verify that the new data type is installed, use the following command to list all user-managed data types:

tle_test=> dT
List of data types
Schema | Name | Description
——–+———-+————-
public | pg_uint2 |
(1 row)

Now that you have observed that your custom data type is installed, create a table that utilizes it:

CREATE TABLE employee_info(
empl_id pg_uint2,
empl_age SMALLINT,
empl_name TEXT
);

CREATE TABLE

Insert some data into your test table:

INSERT INTO employee_info(empl_id, empl_age, empl_name)
VALUES (‘1′, ’47’, ‘Billy’),
(‘2′, ’73’, ‘Steve’),
(‘347′, ’36’, ‘John’),
(‘867′, ’47’, ‘Sue’),
(‘5309′, ’67’, ‘Randy’),
(‘1942′, ’34’, ‘Dominic’),
(‘1337′, ’19’, ‘Scott’),
(‘1207′, ’35’, ‘Amy’),
(‘1941′, ’29’, ‘Gregory’),
(‘0821′, ’36’, ‘Robert’),
(‘1992′, ’31’, ‘Missi’),
(‘0228′, ’47’, ‘Rachel’),
(‘1993′, ’22’, ‘Paul’),
(‘2020′, ’52’, ‘Brad’);

INSERT 0 14

Select the contents of the table and observe what you inserted:

SELECT * from employee_info ;

empl_id | empl_age | empl_name
———+———-+———–
1 | 47 | Billy
2 | 73 | Steve
347 | 36 | John
867 | 47 | Sue
5309 | 67 | Randy
1942 | 34 | Dominic
1337 | 19 | Scott
1207 | 35 | Amy
1941 | 29 | Gregory
0821 | 36 | Robert
1992 | 31 | Missi
0228 | 47 | Rachel
1993 | 22 | Paul
2020 | 52 | Brad
(14 rows)

Let’s see what happens if we try to set invalid data in the uint2 data type. An unsigned int2 data type should only allow values between 0–65535, so attempting negative values or positive values beyond 65535 should fail:

INSERT INTO employee_info(empl_id, empl_age, empl_name)
VALUES (‘-546’, ‘591’, ‘Nosferatu’);

ERROR: Input value does not fit into this unsigned int2 datatype
LINE 2: VALUES (‘-546’, ‘591’, ‘Nosferatu’);
^
CONTEXT: PL/pgSQL function pg_uint2_in(text) line 7 at RAISE

Our unit2 implementation disallows negative values. But what about values that are greater than 65535?

INSERT INTO employee_info(empl_id, empl_age, empl_name)
VALUES (‘65550′, ’36’, ‘Walter’);

ERROR: Input value does not fit into this unsigned int2 datatype
LINE 2: VALUES (‘65550′, ’36’, ‘Walter’);
^
CONTEXT: PL/pgSQL function pg_uint2_in(text) line 7 at RAISE

As we can see, values larger then 65535 are disallowed, validating that our pg_uint2 data type is functioning correctly.

Now that we’ve validated that our new unsigned two-byte integer data type is functioning correctly, we can use the preceding code to redeploy this data type across both managed and unmanaged PostgreSQL instances as a PostgreSQL extension. Note that this is provided as a basic example of how to implement a custom data type using TLE. Additional production hardening and testing with your specific workload is highly recommended. Operators can be used for this data type when authoring TLEs. For details, see Trusted Language Base types.

Best practices

In this section, we share some best practices when creating custom PostgreSQL data types using TLE.

Check base PostgreSQL data types against your use case to confirm a custom data type needs to be written

PostgreSQL contains (as of version 15) over 43 base data types. These data types should accommodate most workloads, but in some circumstances, you may need additional check or validation constraints that scale more effectively when written into a custom data type for your specific use case. Writing custom data types (via TLE or otherwise) means more aspects of your data model that must be maintained and deployed over the lifetime of your infrastructure, and the technical debt should be understood beforehand.

Carefully evaluate the performance of your TLE custom data type prior to production deployment

Test your custom data types and benchmark them against similar base PostgreSQL data types to ensure your custom data types meet your needs in function and when scaled out to your production use cases.

You can use tools like pgbench to simulate your workload first using default data types available in PostgreSQL. If benchmarking results are not as performant as expected (or default data types complicate writing queries around your data model), then consider authoring a custom data type using TLE.

Write your TLE custom data type to closely match your needs

If the decision is made to write (and maintain) custom data types using TLE, ensure that the scope of any custom data types closely matches your business needs.

Much can be accomplished using the default PostgreSQL data types, but it depends on the needs of a given workload. For example, check constraints can be used to imitate unsigned integer data types in some cases, but redistributing TLE custom data types across multiple clusters can add to the technical overhead of maintaining those resources over time. Additionally, customized data types are one more layer in the schema layer that must be maintained over time, and minimizing of technical debt is always recommended where possible.

Conclusion

Although PostgreSQL contains many data types to accommodate most customer workloads, sometimes it’s necessary to create custom data types to accommodate the specialized semantics of new workloads. Although the ability to create custom data types is available natively in PostgreSQL, TLE enables you to easily redeploy custom data types across your AWS relational database infrastructure, and provides both self-managed and AWS managed services. In this post, we covered the how and why behind custom PostgreSQL data types and provided a tutorial demonstrating the creation of a TLE custom data type for storing uint2 in PostgreSQL. Finally, we covered several best practices around custom TLE data types. Custom data type functionality is included in pg_tle v1.1.1, and is available on database instances in Amazon RDS running PostgreSQL 15.4 and higher, 14.9 and higher, and 13.12 and higher in all applicable AWS Regions, including the AWS GovCloud (US) Regions.

We welcome your comments and feedback in the comments section.

About the Authors

Peter Celentano is a Senor Specialist Solutions Architect with Amazon Web Services, focusing on managed PostgreSQL. He works with AWS customers to design scalable, secure, performant, and robust database architectures on the cloud.

Sukhpreet Kaur Bedi is a Database Specialist Solutions Architect with AWS focusing on Amazon RDS/Aurora PostgreSQL engines. She helps customers innovate on the AWS platform by building highly available, scalable, and secure database architectures.

Create custom PostgreSQL data types using Trusted Language Extensions

Trusted Language Extensions (pg_tle) Overview

PostgreSQL custom data types overview

Custom Datatypes data types and TLE in Amazon Aurora or Amazon RDS

Solution overview

Prerequisites

Create an unsigned int2 data type using TLE

Best practices

Check base PostgreSQL data types against your use case to confirm a custom data type needs to be written

Carefully evaluate the performance of your TLE custom data type prior to production deployment

Write your TLE custom data type to closely match your needs

Conclusion

About the Authors

Use Single-AZ read replicas in Amazon RDS for SQL Server

Manage AWS DMS with PowerShell

GQL: The ISO standard for graphs has arrived

LEAVE A REPLY Cancel reply

Most Popular

The overwhelmed person’s guide to Google Cloud: week of April 18

Databricks DBRX is now available in Amazon SageMaker JumpStart

Introducing new ML model monitoring capabilities in BigQuery

Knowledge Bases in Amazon Bedrock now simplifies asking questions on a single document

Recent Comments

EDITOR PICKS

Exploring the Click Element Variable in Google Tag Manager

How to track events with Google Tag Manager and Google Analytics

Data Layer Variable in GTM: What, Why, and Where?

POPULAR POSTS

OLTP vs. OLAP: The Differences You Need To Know

Introducing GKE cost estimator, built right into the Google Cloud console

Informatica’s Data Management Cloud gets new data engineering, MLOps tools

POPULAR CATEGORY