Monday, April 15, 2024
No menu items!
HomeDatabase ManagementChoose AWS Graviton and cloud storage for your Ethereum nodes infrastructure on...

Choose AWS Graviton and cloud storage for your Ethereum nodes infrastructure on AWS

The first question that comes up for everyone who wants to manage their own Ethereum nodes on AWS is how to select the right compute and storage. To answer this question, we ran a series of tests and observed how popular Ethereum Execution Layer (EL) clients go-ethereum with LevelDB (Geth) and Erigon work on Amazon Elastic Compute Cloud (Amazon EC2) instances powered by AWS Graviton processors and with Amazon Elastic Block Storage (Amazon EBS) type gp3 and io2 as well as Amazon FSx for OpenZFS. We used Lighthouse as accompanying Consensus Layer client in both cases, but focused our tests on JSON RPC API exposed by EL clients as the most important for the end user experience. First, we benchmarked the EL clients to compare the performance of different EC2 instance types and used EBS io2 volumes with 64,000 operations per second (IOPS) to avoid storage bottlenecks. Then we ran similar tests but now with different storage configurations and high CPU and memory EC2 instance size to isolate the performance comparison of different storage options. Based on the data we gathered, we identified the set of configurations most suitable for nodes that sync with the rest of the network and serve JSON RPC requests.

Testing EC2 instances powered by AWS Graviton

AWS Graviton is an ARM-based CPU designed by AWS to deliver the best price performance for workloads running in Amazon EC2. Multiple versions of the AWS Graviton processor family were released since 2018. AWS Graviton 3 processors are the latest in the AWS Graviton processor family.

We compared the performance of running two of the most popular EL client configurations, Geth in full node type and Erigon in the only possible for it archive node setup. We tried them on c6g.2xlarge (AWS Graviton 2) and c7g.2xlarge (AWS Graviton3) EC2 instance types and chose the 2xlarge size to follow the recommended minimum of 16 GB RAM for both Geth and Erigon.

For performance testing, we used the K6 load testing tool by Grafana Labs to generate the load from AWS Cloud9 instances. During the tests, we were initiating as many read transactions as possible for the specified time, typically 20–30 minutes. For both Geth and Erigon, we tested two scripted scenarios:

Scenario 1 – We called a single JSON RPC method, eth_estimateGas, which has high CPU utilization and is therefore ideal for testing the processor. To make sure we prevented the EL clients from caching responses, we created a pool of thousands of Ethereum addresses and picked them randomly during the test.
Scenario 2 – We triggered the mix of commonly used JSON RPC methods invoked with the same weight: eth_gasPrice, eth_blockNumber, eth_getBalance, eth_getCode, eth_gasPrice, eth_estimateGas, eth_getBlockByNumber, eth_getTransactionByHash, eth_getTransactionReceipt, and eth_getLogs. We also randomized the data used in requests by choosing random block numbers and picking from a pool of thousands of transaction hashes and addresses for every transaction.

To make sure the tests were consistent between runs, we used the same operating system (Amazon Linux 2) and the same versions of Geth (v1.10.26) and Erigon (v2.34.0) clients across all instance types. We also used the same storage configuration powered by Amazon EBS type gp3 with 16,000 disk operations per second (IOPS), 1000 MB/s throughput, and capacity of 2 TB for Geth and 3 TB for Erigon. For the duration of the tests, we stopped blockchain synchronization on all clients to prevent unpredictable interference from the syncing process. Finally, we used m5.4xlarge instances for AWS Cloud9 generating JSON RPC requests with the K6 tool and placed it in the same subnet within the same Availability Zone with all target nodes to avoid possible bottlenecks in compute and network.

The following tables show the max iterations per second we got for both scenarios.

The first table summarizes the results for Scenario 1 (single JSON RPC method eth_estimateGas).

Instance
Erigon Max Iterations/Sec
Geth Max Iterations/Sec

c6g.2xlarge
10466.6637
10051.32125

c7g.2xlarge
13931.81629
13875.40971

The second table summarizes the results for Scenario 2 (mix of commonly used JSON RPC methods).

Instance
Erigon Max Iterations/Sec
Geth Max Iterations/Sec

c6g.2xlarge
5190.03825
325.32374

c7g.2xlarge
6358.26578
488.86294

In both scenarios, c7g.2xlarge showed the highest maximum iterations per second across Geth and Erigon. Our observation of the efficiency of AWS Graviton-based instances is also similar to Polygon, the Ethereum based scaling platform, which also experienced better performance and lower cost with AWS Graviton for EVM-compatible nodes. You may also notice that Erigon showed higher than Geth with LevelDB performance in Scenario 2 with the mix of commonly used JSON RPC methods. We have fund that for disc-intensive methods, such as eth_getBlockTransactionCountByNumber, Erigon can show six to thirteen times more iterations per second than Geth with LevelDB, while for CPU-intensive methods like eth_estimateGas the performance is about the same.

Even though C7g costs about 6% more than C6g powered by AWS Graviton 2, from the “performance for cost” measure it still comes as a leader.

Choosing the storage

Blockchain nodes need storage that can be directly mounted to the file system and sustain a high number of IOPS with low latency. Based on these requirements, we tested Amazon EBS with volume types gp3 and io2 and FSx for OpenZFS. We decided to skip the tests of EC2 instances with an instance store for now because although the data on an instance store is preserved between instance reboots, it doesn’t persist if the instance is stopped, hibernated, or deleted. That adds operational overhead for the nodes because they have to spend up to multiple hours copying the state data from elsewhere before they get operational. Nevertheless, it’s a popular option among those who run nodes for high-throughput blockchain networks, so we will come back to this option when we start analyzing those.

Both Amazon EBS gp3 and io2 provide single-digit millisecond latency, but io2 has more consistent performance and therefore lower tail latencies than gp3. Tail latency is the duration of the small percentage of response times from a system that take the longest in comparison to the rest of the response times. However, in our observations for Geth and Erigon, there is no noticeable difference between the performance of io2 and more cost-efficient gp3 when configured with up to 16000 IOPS. That amount of IOPS is usually sufficient for nodes that need to follow the chain head and serve 10–15 JSON RPC requests per second. For more performant nodes that need to serve hundreds of JSON RPC requests per second, multiple EBS gp3 volumes can be combined together with RAID 0 to serve 32000 IOPS (two gp3 volumes) or 64000 IOPS (four gp3 volumes). This configuration will be more cost-effective than using a single EBS io2 volume, but with slightly lower performance and more operational overhead.

FSx for OpenZFS can be mounted to the nodes’ file system over a Network File System (NFS) protocol. It showed performance similar to both mentioned Amazon EBS options, but with slightly higher latency. Its advantage is fast provisioning of new volumes on the same file system. The built-in volumes cloning concept allows you to save time of copying state data during initialization and get new nodes up and running quickly. It also helps to save space because new volumes are created as children of older volumes and file reads will be served from parent blocks until the data on them is changed. Only when the data is changed it is copied to the new volume. Unfortunately, Erigon can’t work with disk volumes through NFS protocol, so in our tests, FSx for OpenZFS only worked for Geth. Considering the costs of FSx for OpenZFS, if you don’t need to provision new nodes as quickly as possible and accept about 30 minutes to 1 hour of initialization time, then Amazon EBS gp3 is still the more cost-effective option.

Choosing the configurations

The goal of our analysis was to find the combinations of Graviton-powered EC2 instances and storage that will be cost-effective and performant. Most Ethereum clients combine functions of syncing with other nodes in the blockchain network and serving the JSON RPC API to decentralized applications (dApps). Processing JSON RPC requests takes more resources than just syncing the data, but at the same time not everyone needs the nodes for dApps. Some use nodes only as the reliable feed of blockchain network data for analytics systems or locally-stored snapshots. Therefore, we came up with multiple configurations based on the role the node plays.

Configurations for sync nodes

The purpose of the sync node is to catch up with the chain head. In addition to stress tests that we ran to identify better-suited compute and storage, we also left sync nodes running for a week and then let AWS Compute Optimizer advise us on the sizes for compute and storage. Finally, we ran those setups for a few more days. The following table summarizes the final configurations.

EL + CL Combination
Instance Type
EBS Volume

Geth + Lighthouse/Prysm
r6g.2xlarge
GP3 – 3000 IOPS, 200 MBps throughput

Erigon + Lighthouse
r6g.4xlarge
GP3 – 5900 IOPS, 125 MBps throughput

Nethermind + Teku
r6g.2xlarge
GP3 – 3000 IOPS, 125 MBps throughput

Besu + Teku
m6g.4xlarge
GP3 – 3000 IOPS, 125 MBps throughput

Configurations for RPC nodes

RPC nodes serve requests from dApps while also staying in sync with the rest of the Ethereum network. The configurations in the following table assume the nodes with Erigon will handle below 30 requests per second of storage-intensive trace_block RPC method, while still following the chain head.

EL + CL Combination
Instance Type
EBS Volume

Geth + Lighthouse/Prysm
m7g.4xlarge
GP3 – 16000 IOPS, 1000 MBps throughput

Erigon + Lighthouse
m7g.4xlarge
GP3 – 16000 IOPS, 1000 MBps throughput

Further tests also showed that the Erigon with Lighthouse combination can sustain about 300 RPC requests per second on m7g.16xlarge and 4 EBS gp3 volumes with 16000 IOPS and 1000 Mbps throughput configured as RAID 0 volume. We also found that from a performance perspective on a mixed JSON RPC test scenario, one Erigon-powered node can be used as a replacement of two Geth-powered nodes on a similar setup.

Considerations

The analysis and the recommendations in this post assume that you run Ethereum nodes for your own needs. Those who provide nodes as a service to others, like node operators, might consider building more sophisticated architectures and end up with different configurations. For example, you may consider using EC2 instances types with an instance store to get the storage performance at the levels of EBS io2 and benefit from compute savings plans to pay less for both compute and storage. If you’d prefer to get new nodes up and running as fast as possible, you might still use FSx for OpenZFS to benefit from fast volume provisioning.

Conclusion

In this post we shared the configurations of compute and storage that we found suitable for running Ethereum nodes on AWS. Through our tests and observations, we found that AWS Graviton-based instances, particularly the C7g and M7g instance types demonstrated good performance and cost-effectiveness for popular EL clients like Geth and Erigon. The M7g has more memory than C7g and that helps to both serve JSON RPC requests and follow the chain head.

As for the storage, both Amazon EBS with gp3 and io2 volume types and FSx for OpenZFS provided suitable options. Although io2 volumes offer more consistent performance, gp3 volumes configured with sufficient IOPS proved to be a cost-effective choice. FSx for OpenZFS also showed good results, especially for speeding up horizontal scaling, although it’s not compatible with the Erigon client.

Our recommended configurations vary based on the node’s role, whether it’s a sync node or an RPC node serving dApps. The suggested instance types and EBS volumes provide a good balance between performance and cost-effectiveness, enabling nodes to synchronize with the network and handle RPC requests efficiently.

As the Ethereum ecosystem continues to evolve rapidly, it’s essential to stay informed about new developments and updates that may impact node configurations in the future. Regular monitoring and evaluation of performance and cost-efficiency are necessary to ensure optimal node operation and maintain a reliable service for the Ethereum network and dApps. If you have further questions, ask them on AWS re:Post with tag “blockchain” or reach out to your AWS team.

About the Authors

Nikolay Vlasov is a Senior Solutions Architect at AWS, focused on blockchain-related workloads. He helps clients run workloads supporting decentralized web and ledger technologies on AWS.

Aldred Halim is a Solutions Architect at AWS. He works closely with customers in designing architectures and building components to ensure success in running blockchain workloads on AWS.

Read MoreAWS Database Blog

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments