Amazon DynamoDB single-table design using DynamoDBMapper and Spring Boot

By mullaned2002

June 4, 2021

2331

A common practice when creating a data model design, especially in the relational database management system (RDMS) world, is to start by creating an entity relationship diagram (ERD). Afterwards, you normalize your data by creating a table for each entity type in your ERD design.

The term normalization refers to the process of organizing the columns (attributes) and tables (relations) of a relational database to minimize data redundancy. The practice of creating ERDs works even with NoSQL database systems such as Amazon DynamoDB.

The patterns provided by modules such as Spring Data, which is used by Spring Boot based application for data access, still heavily depend on these patterns from the RDMS world. However, normalizing your data in this way doesn’t yield optimal results when you’re using a nonrelational database. Relational databases use joins to combine records from two or more tables, but those joins are expensive. However, DynamoDB does not support joins. Instead, data is pre-joined and denormalized into a single-table.

This blog post shows how to implement an ERD design by using a single-table design approach instead of using multiple tables. We use the higher-level programming interface for DynamoDB called DynamoDBMapper to demonstrate an example implementation based on Spring Boot.

Solution overview

In this post, we use the Ski Resort Data Model that is provided as an example in NoSQL Workbench for DynamoDB. This example model provides several entities and defines the following access patterns:

Retrieval of all dynamic and static data for a given ski lift or overall resort, facilitated by the table SkiLifts
Retrieval of all dynamic data (including unique lift riders, snow coverage, avalanche danger, and lift status) for a ski lift or the overall resort on a specific date, facilitated by the table SkiLifts
Retrieval of all static data (including if the lift is for experienced riders only, vertical feet the lift rises, and lift ride time) for a certain ski lift facilitated by the table SkiLifts
Retrieval of the date of data recorded for a certain ski lift or the overall resort sorted by total unique riders, facilitated by the SkiLifts table’s global secondary index SkiLiftsByRiders

With dynamic and static data in a single table, we can construct queries that return all needed data in a single interaction with the database. This is important for speeding up the performance of the application for these specific access patterns. However, there is a potential downside, the design of your data model is tailored towards supporting these specific access patterns. Which could conflict with other access patterns, making those less efficient. Because of this trade-off it’s important to prioritize your access patterns and optimize for performance as well as cost based on priority.

To apply the single-table design successfully in your application, you need to understand your application’s data access patterns. Access patterns are dictated by your design, and using a single-table design requires a different way of thinking about data modeling. You can learn more about this pattern from the AWS re:Invent 2020 talks from Alex DeBrie (AWS Data Hero), Data modeling with DynamoDB – Part 1 and Data modeling with DynamoDB – Part 2. Additionally, Amazon DynamoDB Office Hours with Rick Houlihan (senior practice manager at AWS) are a great source of information that include examples of modeling real-world applications.

Usually, you don’t know all the access patterns beforehand. Iterate your design and continue to improve it before actually putting the application into use.

In this blog post’s example application, we use the following stack:

Amazon Corretto 11, the no-cost, multiplatform, production-ready distribution of the Open Java Development Kit (OpenJDK)
Spring Boot version 2.4, Spring’s convention-over-configuration solution for creating stand-alone, production-grade Spring-based applications
Apache Maven, a software project management and comprehension tool
Amazon DynamoDB Local, the downloadable version of DynamoDB you can use to develop and test applications in your development environment
AWS SDK for Java v1, specifically for the higher-level programming interface for DynamoDB, which is called DynamoDBMapper
Project Lombok, a java library that reduces boilerplate code by using annotations in your classes
JUnit 5, unit testing framework for Java based applications

The first iteration of our data model is shown in the following table.

Primary Key
Attributes
PK
SK
Date
Total
Unique
LiftRiders
Average
Snow
Coverage
Inches
Avalanche
Danger
Open
Lifts
Experienced
Riders
Only
Vertical
Feet
Lift
Time
Lift
Number
RESORT_DATA
DATE#07-03-2021
07-03-2021
7788
50
HIGH
60

RESORT_DATA
DATE#08-03-2021
08-03-2021
6699
40
MODERATE
60

RESORT_DATA
DATE#09-03-2021
09-03-2021
5678
65
EXTREME
53

LIFT#1234
STATIC_DATA

TRUE
1230
7:00
4545
LIFT#1234
DATE#07-03-2021
07-03-2021
3000
60
HIGH
OPEN

LIFT#1234
DATE#08-03-2021
08-03-2021
3500
50
MODERATE
OPEN

LIFT#6789
STATIC_DATA

FALSE
2340
13:00
1122
LIFT#6789
DATE#08-03-2021
08-03-2021
4000
60
MODERATE
OPEN

LIFT#6789
DATE#09-03-2021
09-03-2021
2000
88
EXTREME
OPEN

This table uses the DynamoDB concept called composite primary key. A composite primary key is composed of two attributes. The first attribute is the partition key (PK) and the second attribute is the sort key (SK). DynamoDB uses the partition key’s value as input to an internal hash function. The output from the hash function determines the partition (physical storage internal to DynamoDB) in which the item will be stored. All items with the same partition key value are stored together, in sorted order by sort key value. The values for the partition key and sort key in this table start with a prefix like <PREFIX>#, which makes values easier to understand. Such a prefix also allows you to create simple queries on the sort key that filter on items starting with a certain prefix.

Prerequisites for this solution

For this walkthrough, you should have the following prerequisites:

Java Development Kit (JDK), such as Amazon Corretto installed, version 11 or higher
Apache Maven, which you can install locally or use the Maven wrapper that is provided with the example project

Implementing the solution

We focus on two access patterns in this post and provide integration tests that demonstrate the functionality by using DynamoDB Local. Integration tests provide examples that can be a good starting point when you plan to implement a similar access pattern in your own application.

We focus on the following two patterns:

Retrieval of all dynamic and static data for a given ski lift or overall resort.
Retrieval of the date of dynamic data recorded for a certain ski lift or the overall resort sorted by total unique riders. To make this query efficient, we use a global secondary index on the DynamoDB table.

Follow these steps to create an environment in which to test these access patterns:

Create the Spring Boot application.
Add domain classes, providing a mapper between Java POJOs and the DynamoDB model. To reduce the amount of boilerplate code we need to write, we use Project Lombok annotations to generate most of this code.
Add integration tests to validate the access patterns by using DynamoDB Local.

The example project can be found in this GitHub repo.

Using the combination of Spring Boot with Project Lombok is common practice, because the use of Project Lombok minimizes boilerplate code and thereby improves the developer productivity in creating Spring Boot based applications. The Spring Data model is often used for accessing databases. Implementing the data access layer of your application without Spring Data and instead using the higher-level programming interface provided by the AWS SDK for Java has some advantages. For example, you can create a dedicated project for data access, allowing you to not only use this library in your Spring Boot applications but also in other plain Java code. Creating your domain classes that provide the mapping between the application logic and DynamoDB is easier when you combine Project Lombok and the AWS SDK for Java. The following code example demonstrates how to use the Project Lombok annotations and DynamoDBMapper annotations together to create a Java POJO representing the static lift stats domain class. The Project Lombok annotations minimizes the boilerplate code and the DynamoDBMapper annotations provide a mapping between this class and its properties to tables and attributes in DynamoDB. For example the @DynamoDBHashKey and @DynamoDBTable annotations allows DynamoDBMapper to link the getPK() method to the partition key in the table SkiLifts.

@AllArgsConstructor
@Builder
@Data
@DynamoDBTable(tableName = “SkiLifts”)
@NoArgsConstructor
public class LiftStaticStats {

@DynamoDBAttribute(attributeName = “ExperiencedRidersOnly”)
private boolean experiencedRidersOnly;

@DynamoDBAttribute(attributeName = “VerticalFeet”)
private int verticalFeet;

@DynamoDBAttribute(attributeName = “LiftTime”)
private String liftTime;

@DynamoDBAttribute(attributeName = “LiftNumber”)
private int liftNumber;

@DynamoDBHashKey(attributeName = “PK”)
public String getPK() {
return “LIFT#” + liftNumber;
}

@DynamoDBRangeKey(attributeName = “SK”)
public String getSK() {
return “STATIC_DATA”;
}
}

The following code block creates a QueryRequest expressing to DynamoDB that we want all data from the table that share the same partition key represented by the attribute liftPK. The result of this request is retrieved from DynamoDB by performing a query:

AttributeValue liftPK = new AttributeValue(“LIFT#” + liftNumber);
QueryRequest queryRequest = new QueryRequest()
.withTableName(“SkiLifts”)
.withKeyConditionExpression(“PK = :v_pk”)
.withExpressionAttributeValues(Map.of(“:v_pk”, liftPK));
QueryResult queryResult = amazonDynamoDB.query(queryRequest);

The results of this query can contain items of different types of objects, both LiftDynamicStats and LiftStaticStats objects. The DynamoDBMapper class isn’t suited to implement this query because its typed methods don’t allow for a query result that contains different types of objects. However, for this access pattern it is important to retrieve the data set containing different types of objects with just one query to DynamoDB. Because the QueryRequest and QueryResult classes are able to deal with query results containing different types of data objects, using the QueryRequest and QueryResult classes is the best alternative for implementing this query.

Second access pattern

Our second access pattern is the retrieval of the date of dynamic data recorded for a certain ski lift or the overall resort sorted by total unique riders. We need to sort this data by the number of unique riders, but the table design doesn’t facilitate an easy query for such a use case. For this reason, we introduce a global secondary index to support our access pattern. The partition key (PK) remains the same, but we use the total unique riders property as the sort key (SK). Do we need more data for this access pattern? Yes: the date, but other attributes aren’t relevant, so those are not included in global secondary index.

The following table provides some example data in which the items are sorted by the total unique lift riders.

Primary Key
Attributes
PK
SK
TotalUniqueLiftRiders
Date
RESORT_DATA
TOTAL_UNIQUE_LIFT_RIDERS#7788
7788
07-03-2021
RESORT_DATA
TOTAL_UNIQUE_LIFT_RIDERS#6699
6699
08-03-2021
RESORT_DATA
TOTAL_UNIQUE_LIFT_RIDERS#5678
5678
09-03-2021
LIFT#1234
TOTAL_UNIQUE_LIFT_RIDERS#3500
3500
08-03-2021
LIFT#1234
TOTAL_UNIQUE_LIFT_RIDERS#4000
4000
08-03-2021
LIFT#6789
TOTAL_UNIQUE_LIFT_RIDERS#3000
3000
07-03-2021
LIFT#6789
TOTAL_UNIQUE_LIFT_RIDERS#2000
2000
09-03-2021

With just one query, it’s very easy to get a list for a specific lift sorted by the total unique lift riders. The only additional data retrieved by this query is the date. The integration test in the project called GlobalSecondaryIndexTestIT.testRetrieveDateOfLiftDataSortedByTotalUniqueLift() implements this scenario. See the following code, in which we use the DynamoDBMapper to query the global secondary index using an expression that will only return objects of the type LiftDynamicStats:

List<LiftDynamicStats> results = mapper.query(LiftDynamicStats.class,
new DynamoDBQueryExpression<LiftDynamicStats>()
.withConsistentRead(false)
.withExpressionAttributeValues(
Map.of(“:val1”, new AttributeValue().withS(“LIFT#” + lift1)))
.withIndexName(“GSI_1”)
.withKeyConditionExpression(“GSI_1_PK = :val1”));

Run tests in the project by using Maven

To run our tests, we run the following command in the root folder of the project:

./mvnw clean verify

The output shows the results of running the tests, including access to DynamoDB Local. The test results are not that important. We used these tests to demonstrate how different access patterns can be implemented and thereby providing a starting point for integrating the single-table design in Java applications.

You also can find the test results in <root-folder>/target/surefire-reports/.

Summary

This post showed how to complement the functionality provided by the AWS SDK for Java with the functionality provided by Project Lombok. Such an approach allows for an efficient programming model in Spring Boot–based applications as well as any other Java application.

Furthermore, you can extend the same concept in this post to simple functions, including AWS Lambda functions. Within a project, you can use this data access layer in applications based on Spring Boot and deployed on Amazon Elastic Container Service (Amazon ECS) on AWS Fargate. Similarly, you can use the data access layer within the same project in smaller scoped functions deployed as lightweight Lambda functions. This way, you can avoid the added overhead of Spring Boot. This is one of the main advantages of using the components provided by the AWS SDK for Java instead of implementations based on modules such as Spring Data.

This post’s example project demonstrates functionality by using DynamoDB Local, but also provides a great stepping stone to start developing your own Java-based applications and functions.

About the author

Arjan Schaaf is a cloud infrastructure architect at AWS Professional Services, based in the Netherlands. He helps customers solve complex challenges by providing solutions that use AWS services. When not working, Arjan likes Alpine activities, backyard BBQ, and spending time with family and friends.

Amazon DynamoDB single-table design using DynamoDBMapper and Spring Boot

Solution overview

Prerequisites for this solution

Implementing the solution

Second access pattern

Run tests in the project by using Maven

Summary

About the author

GQL: The ISO standard for graphs has arrived

Set up notifications for Amazon RDS pending maintenance actions

Enhance PostgreSQL database security using hooks with Trusted Language Extensions

LEAVE A REPLY Cancel reply

Most Popular

The overwhelmed person’s guide to Google Cloud: week of April 18

Inpainting and Outpainting with Stable Diffusion

Announcing PyTorch/XLA 2.3: Distributed training, dev improvements, and GPUs

GQL: The ISO standard for graphs has arrived

Recent Comments

EDITOR PICKS

Exploring the Click Element Variable in Google Tag Manager

How to track events with Google Tag Manager and Google Analytics

Data Layer Variable in GTM: What, Why, and Where?

POPULAR POSTS

Build, tune, and deploy an end-to-end churn prediction model using Amazon SageMaker Pipelines

Serverless data architecture for trade surveillance at Deutsche Bank

The key role ‘visibility’ plays in healthcare’s cybersecurity resilience

POPULAR CATEGORY