How Agnostic Engineering improved storage latency for running Polygon nodes on AWS

By mullaned2002

May 22, 2024

95

This is a guest post co-written by Arnaud Briche, the Founder of Agnostic.

At Agnostic, our mission is to democratize access to well-structured blockchain data. We aim to provide a swift, user-friendly, and robust method for querying the vast volumes of data generated by smart contract blockchains. As a company, for performance reasons we first started with using bare metal service providers, but then decided to use AWS global presence to expand our business and achieve better reliability of our service.

One of the blockchain networks we support is Polygon, which is a high throughput network requiring low digit millisecond read operation latency on block storage volumes. This is critical to enable the blockchain client, Erigon in this case, to keep up to date with the latest state of the network. When we tested the Amazon Elastic Block Store (Amazon EBS) gp3 volume, we discovered that the tail latency can’t always meet our targets, and the EBS io2 volume has challenging cost-efficiency for our setup. Tail latency is the duration of the small percentage of response times from a system that take the longest in comparison to the rest of the response times.

In this post, we show an alternative solution we created to achieve the ultra-low-latency read operations that we needed.

Solution overview

The solution we implemented combines the advantages of Amazon Elastic Compute Cloud (Amazon EC2) instance store volumes with general purpose durable EBS gp3 block storage. We chose to use instance store volumes as a cost-effective, low-latency storage option. These volumes are located on discs physically attached to the host computers and which allow us to use temporary low-latency block-level storage in some EC2 instance types. The following diagram illustrates how the overall solution works.

The solution consists of the following key components:

We selected an EC2 instance with multiple SSD-powered instance store volumes. To achieve even better latency, we combined them into a single striped storage using RAID 0 configuration.
To prevent potential data loss, we attached an EBS gp3 volume with the total size of the new RAID 0 array and set up data mirroring with RAID 1 configuration.
We also made the EBS gp3 volume write-mostly within the RAID 1 setup to make sure read operations will be served by the faster RAID 0 array of combined instance store volumes.

After setting up the RAID arrays, we installed and configured an OpenZFS file system to benefit from its built-in data compression feature and further optimize space consumption.

We also created an AWS CloudFormation template with a user-data script to simplify deployment and configuration. To try a preconfigured instance similar to ours in your AWS account, use the following stack:

Make sure you review all the configuration parameters and select I acknowledge that AWS CloudFormation might create IAM resources with custom names.

Note that this template will deploy resources outside of free-tier and will incur additional costs. You can estimate the costs based on the infrastructure considerations we discuss in the next section.

Solution deep dive

We chose the i3en.6xlarge instance model that comes with two NVMe SSD instance store volumes 7500 GB in size each. We then attached an EBS gp3 14000 GiB volume to match the combined size of the instance store volumes. We also used Ubuntu Linux version 20.04 for simplicity. Then we finished the setup using the following steps:

Configure the RAID 0 array with instance volumes:

mdadm –verbose –create /dev/md/0 –level=0 –raid-devices=2 /dev/nvme2n1 /dev/nvme3n1

Configure the RAID 1 array between the combined instance store volumes and the attached EBS volume with write-mostly setting:

mdadm –create –verbose –assume-clean –force /dev/md/1 –level=1 –name=1 –raid-devices=2 /dev/md/0 –write-mostly /dev/nvme1n1

Install the OpenZFS file system on Ubuntu Linux version 20.04:

apt -yqq install zfsutils-linux

Configure the OpenZFS file system with compression for the new RAID0+1 array:

# ZFS can have a hard time deducing the actual underlying hardware block size,
# so we usually set it explicitly to be 4KiB (and max volume size limited to 16 TiB) on ZPOOL creation.

zpool create -o ashift=12 data /dev/md/1

# Checking the status of our new zfs pool

zpool status data

# Enabling compression

zfs set compression=lz4 data

# Default ZFS recodsize of 128K does not align well with the access pattern of our
# workload (blockchain nodes). We thus set it to much lower values.

zfs set recordsize=16K data

# Access time tracking is usually not useful for our workload and can
# cost a lot of useless IOs so we disable it.
zfs set atime=off data

# Move eXtended ATTRibutes from hidden subdirectories to System Attributes
zfs set xattr=sa data

The new data volume will be mounted to the file system under /data, and we continue with deploying archive node with Polygon Erigon client.

If you used our sample CloudFormation template to create your instance, the user-data script automates the preceding steps for you.

After the deployment is complete, you can connect to your EC2 instance using Sessions Manager, a capability of AWS Systems Manager. For more details, see Connect to your Linux instance with AWS Systems Manager Session Manager.

Then you can check the storage configuration using the following commands.

Check the size of the new /data volume:

df -h | grep /data

Get a summary of all RAID configs:

mdadm –detail –scan –verbose

Check the configuration of all block devices:

lsblk

Get the OpenZFS configuration of your new /data volume:

zfs get all data

Then you can continue with installing your own Erigon client for Polygon. Make sure you configure /data in the –datadir parameter and add inbound rules to open ports 30303 and 26656 in the security group associated with the new EC2 instance.

Cleanup

To delete all resources deployed with the “Launch Stack” link above, navigate to CloudFormation service in your AWS console, select the stack you have deployed (default name is “single-bc-node-with-RAID10”) and click “Delete” button in top-right.

Conclusion

In this post, we shared how Agnostic successfully created a node that matches the speed of a bare-metal setup while significantly enhancing durability and enabling us to efficiently back up and replicate data for rapid node provisioning. This achievement exemplifies the convergence of top-tier performance and unmatched flexibility. Learn more about Agnostic Engineering and, if you have further questions, please ask them on AWS re:Post with the tag “blockchain.” If you have trouble with the provided CloudFormation template, open an issue with the AWS Blockchain Node Runners repository on GitHub.

About the Authors

Arnaud Briche is the Founder of Agnostic. His specialties are building and managing distributed systems at scale.

Nikolay Vlasov is a Senior Solutions Architect with the AWS Worldwide Specialist Solutions Architect organization, focused on blockchain-related workloads. He helps clients run the workloads supporting decentralized web and ledger technologies on AWS.

How Agnostic Engineering improved storage latency for running Polygon nodes on AWS

Solution overview

Solution deep dive

Cleanup

Conclusion

About the Authors

Schneider Electric automates Salesforce account hierarchy management with generative artificial intelligence (AI) using Amazon Aurora and Amazon Bedrock

Make relevant movie recommendations using Amazon Neptune, Amazon Neptune Machine Learning, and Amazon OpenSearch Service

Implement UUIDv7 in Amazon RDS for PostgreSQL using Trusted Language Extensions

LEAVE A REPLY Cancel reply

Most Popular

Schneider Electric automates Salesforce account hierarchy management with generative artificial intelligence (AI) using Amazon Aurora and Amazon Bedrock

Leverage enterprise data with Denodo and Vertex AI for generative AI applications

TypeScript takes aim at truthy and nullish bugs

Make relevant movie recommendations using Amazon Neptune, Amazon Neptune Machine Learning, and Amazon OpenSearch Service

Recent Comments

EDITOR PICKS

Exploring the Click Element Variable in Google Tag Manager

How to track events with Google Tag Manager and Google Analytics

Data Layer Variable in GTM: What, Why, and Where?

POPULAR POSTS

The Untapped Potential of Data Engineers

Build Oracle Enterprise Manager with a repository in an Amazon RDS Custom for Oracle database

Few-click segmentation mask labeling in Amazon SageMaker Ground Truth Plus

POPULAR CATEGORY