One of the challenges that software as a service (SaaS) providers face as their business grows is how to maintain their tenants’ experience. This includes ensuring acceptable performance and response times as the tenant base grows. Relational databases, such as Amazon Relational Database Service (Amazon RDS) and Amazon Aurora, are commonly used by SaaS providers. As the business grows, they must also decide how to scale the database.
Your challenge as a SaaS builder is to decide which scaling mechanism is right for your stage in your SaaS journey. As your application grows, you may need to change your database architecture when your existing solution no longer meets your needs. Implementing new technologies and architecture patterns can take focus away from driving growth and investing in your core business. Sometimes it’s more beneficial to optimize an existing architecture to keep the knowledge in your teams, rather than migrating to a new technology or architecture.
In this series, we provide guidance for SaaS providers who have an existing relational database and are looking for guidance on how to scale effectively. In Part 1, we explore the benefits of common relational database scaling patterns from a SaaS perspective, so that you can make an informed decision on a scaling approach that will fit your SaaS solution and serve as a starting point for deeper discovery. In Part 2, we explore sharding and tenant routing.
Database scaling and SaaS partitioning models
Before we dive into database scaling, it’s important to understand SaaS data partitioning models. The three high-level SaaS partitioning models you can use are silo, bridge, and pool:
Silo model – Each tenant has their own physical resource, such as a database instance per tenant
Bridge model – Tenants share physical resources but have logical separation, such as a table or schema per tenant
Pool model – Tenants share the same physical and logical resource, for example a single shared table
A SaaS solution can support many tenants and their users’ data, and their data needs to scale as your business grows. A single database may eventually become a performance bottleneck, so you should understand what scaling options you have.
Scaling your relational database can broadly fall into three categories:
Physical resources – First, you can increase the physical resources of your dataset by scaling up or out with read replicas or sharding (explored in Part 2 of this series).
Database operations – Second, you can optimize database operations with techniques such as better indexing or a data model change.
Database architecture – Finally, you can change your database architecture for all or part of your dataset. This could take the form of using a purpose-built database, adding a caching mechanism, or larger architectural changes like creating an asynchronous process or implementing the Command Query Responsibility Separation (CQRS) pattern. This approach is discussed in more detail in Modernize legacy databases using event sourcing and CQRS with AWS DMS.
A mix of scaling mechanisms will often deliver the best outcome, and some scaling patterns are more suitable for certain partitioning models and usage patterns. We investigate these differences throughout this series.
Scaling your existing architecture with additional physical resources
Adding more physical resources is generally the first approach SaaS providers take when scaling relational databases. There are upper limits to how far you can grow your physical resources, but until that point, this is an effective way to scale your database without making architectural changes that would take resources away from core business goals, at the expense of higher infrastructure costs.
Scaling vertically
SaaS providers commonly choose vertical scaling as a first scaling approach. You provide more physical resources (CPU, memory, network), which lets you support more concurrent connections. You keep your existing partitioning model and scale your database in place without changing your connectivity configuration. With Amazon RDS, you can scale your resources vertically to any available instance size by selecting a new instance type. This restarts the instance, which includes some minimal downtime, but with Amazon Aurora Serverless v2, you can scale up and down without service interruption. This makes it a good choice for SaaS providers with variable database workloads.
For the silo model, there are no SaaS-specific differences compared to scaling in a non-SaaS application. However, in the bridge and pool models, you should assess the drivers for scaling and consider potential risks. Scaling to add more tenants will increase the area of impact for performance-impacting events, such as a noisy neighbor or infrastructure failure. Similarly, if scaling is required due to a single tenant impacting availability for other tenants, consider whether throttling tenant performance, or migrating such tenants to a silo database, will offer a better service and operational experience.
Scaling horizontally with read replicas
If you need to solve a read load problem, using read replicas is a way to increase performance without needing to change your data partitioning model. This can also provide other benefits—for example, you can promote an RDS read replica to a standalone instance or clone an Aurora storage volume to a new cluster, which can increase your availability, or facilitate database sharding. An Aurora cluster can contain up to 15 read replicas that can be provisioned at the click of a button. Additionally, Aurora read replicas use storage-layer replication, so they don’t have the same performance impact on the primary as traditional approaches to replication.
Multiple read replicas are possible for different purposes. You can introduce additional service features based on read operations to your SaaS product without affecting services using the primary instance. It’s possible to reduce operational costs by running these activities on different instance types.
Adding or removing a read replica doesn’t require downtime. You can automatically manage read replica scaling in Aurora if you enable Auto Scaling with Aurora replicas, reducing your operational complexity.
Optimizing your existing architecture
Another way to scale your database performance is to optimize your existing architecture to improve database efficiency without changing your partitioning model. This can delay the need to scale, or simply improve your tenant’s service experience with increased performance. Each optimization approach addresses specific use cases, so it’s important to understand which can apply to your product.
Micro-batching
Every database request has a performance overhead associated with it. If you have a SaaS application with many users making frequent requests, your database may suffer from the cumulative overhead of handling many concurrent requests. You can reduce this impact by batching similar requests together in a pattern call micro-batching.
This approach can be effective in all partitioning models when you have many users with frequent small requests. It can be especially beneficial in the pool model, because it can help you support more tenants per database instance, further increasing the operational efficiencies.
You may need to create a solution to handle micro-batching, increasing your operational complexity. In the following example, instead of writing directly to the database, the application writes to an Amazon Simple Queue Service (Amazon SQS) queue. Then an AWS Lambda function polls this SQS queue, consuming a batch of messages and writing them to the database atomically, at the cost of eventual consistency.
Database table partitioning
Most relational databases support table partitioning, where you divide a logically large database table into smaller physical pieces. When you access data within a partition, you don’t need to traverse the entire parent table to find the specific data you’re looking for. This can be beneficial in the pool model, where your SaaS application uses a large multi-tenant dataset.
Table partitioning segments a table with a partition key, for example by date or tenant ID. Depending on the database engine, you can do table partitioning using a range, a specified list of keys, or a hash value against the partition key. The root table remains accessible as it was, and you don’t need to re-architect queries to take advantage of table partitioning.
By using tenant_id as the partition key in the pool model, you improve the performance of tenant-specific queries without having to re-architect any application code. An example implementation is shown in the following diagram.
You can scale table partitioning with tenant_id beyond a single physical resource by extending the concept with database sharding. We explore this in the second part of the series.
Database table partitioning can introduce hot spots where a single partition becomes a performance bottleneck and requires rebalancing. It also introduces operational overhead to manage partitions. If you’re using Amazon RDS for PostgreSQL or Amazon Aurora PostgreSQL-Compatible Edition, you can use the PostgreSQL pg_partman extension to automate the creation and maintenance of table partitions. You should consider the partition limits for your chosen database engine and the locking behavior when performing any maintenance operations, because it may not be suitable for many tenants sharing the same database. As with any changes to your environment, monitor that you are achieving your expected results and adjust accordingly to deviation.
Aggregating data
SaaS providers often perform per-user or per-tenant aggregations over a dataset to provide analytical insights to tenants. You may perform per-tenant or cross-tenant aggregations for metrics or metering, or to improve the multi-tenant performance of your dataset. This may lead to your database serving both online transaction processing (OLTP) and online analytical processing (OLAP) queries.
The impact of performing read queries that perform aggregations over a dataset is expensive and can affect tenants. If this activity isn’t predictable, it can become a challenge to maintain a consistent level of performance for your tenants, degrading the service experience. This impact is greater in the bridge and pool models, where multiple tenants compete for the same physical database resources and may run competing aggregation activities.
If your service offering doesn’t need real-time data from these aggregation activities, you can pre-compute the results. For example, you may generate data and analytics on tenant data every 3 hours instead of calculating it on demand. This helps deliver more uniform database performance, and the tenant service experience will improve because results are available immediately instead of having to wait for them to be calculated. You can even use pg_cron on RDS for PostgreSQL or Aurora PostgreSQL databases to automate this activity. Using read replicas to deliver these results can further improve performance.
This can be taken further by offloading the analytics queries to a data warehouse, such as Amazon Redshift. Zero-ETL integration between Aurora and Amazon Redshift allows you to automatically sync data in near real time between your transactional database and your analytical database, without having to develop complex data pipelines.
You should assess your use cases to validate whether the overhead of maintaining pre-computation processes is worthwhile. If your data access patterns rarely perform such aggregations, then it is unlikely to be a positive return on effort.
Scaling database connectivity
A database has a limited number of concurrent connections available, and you may find that you reach this limit when using bridge or pool deployment models as the number of connections rises with the number of tenants. This can lead to timeouts and inconsistent performance.
Implementing a database proxy, such as Amazon RDS Proxy, lets you pool and share database connections, scaling your database further. For more details on the performance characteristics of RDS Proxy, see Build and load test a multi-tenant SaaS database proxy solution with Amazon RDS Proxy. Alternatively, consider a connectionless interface to the database, such as the RDS Data API.
Bringing in purpose-built databases
To maintain database performance in line with your growth, you may need to look at changing your database architecture to optimize on specific use cases. One way to approach this is to use separate logical schemas in the same database, each optimized for a separate use case.
As the functionality of your SaaS application develops, it can be beneficial to break one dataset into several. You can isolate individual access patterns and move them to purpose-built databases. Rather than trying to optimize for multiple different access patterns and ending up with the lowest common denominator, you instead align the access pattern to the database technology and optimize specifically for that access pattern, choosing the right tool for the right job.
For example, you may use a NoSQL database like Amazon DynamoDB for high-throughput transactional data, or as a key-value store for tenant management. You can use a data warehouse like Amazon Redshift to break out data used in aggregation activities, introducing a new service tier for tenants needing higher performance levels. You may even introduce new functionality that is difficult to implement in your existing relational database, such as fraud detection using Amazon Neptune, capturing a new market for your existing SaaS product.
Bringing in purpose-built databases will change how to implement tenant isolation and may require that you reevaluate your partitioning model. Although you may keep your partitioning model, how you implement tenant isolation is likely to differ. You should consider this when evaluating a new database technology.
For further information, we recommend following the guidance provided in the AWS Well-Architected Framework, specifically database architecture selection in the performance pillar, and the Data Analytics Lens.
Verifying your database scaling approach
You should thoroughly test and verify that your scaling approach meets your expectations. Metrics will play an important part in providing data for these decisions. For example, you can use Amazon RDS Performance Insights to monitor database performance metrics in a straightforward, automated manner, or enable Enhanced Monitoring to view all the system metrics for your RDS database instance on the Amazon RDS console. Aim to implement performance metrics into your operational management tasks, to identify when scaling is not working as expected and when new scaling mechanisms are required.
Conclusion
Your database architecture will develop as your SaaS application grows. During your SaaS journey, you will need to decide whether to scale an existing architecture in place, or to change your architecture to meet your growing requirements. You should back your decisions with metrics, verifying that your scaling approach is effective, and maintain these insights to identify when additional scaling methods are required.
In this post, we covered several relational database scaling approaches and how they relate to SaaS applications to help you make an informed decision on how to scale your relational database. In Part 2, we investigate database sharding and how you can manage routing of tenants to their respective datasets.
About AWS SaaS Factory
AWS SaaS Factory helps organizations at any stage of the SaaS journey. Whether looking to build new products, migrate existing applications, or optimize SaaS solutions on AWS, we can help. Visit the AWS SaaS Factory Insights Hub to discover more technical and business content and best practices.
We encourage you to reach out to your account representative to inquire about engagement models and to work with the AWS SaaS Factory team.
About the Authors
Dave Roberts is a Senior Solutions Architect and member of the AWS SaaS Factory team where he guides AWS partners building SaaS products on AWS. When he’s not talking SaaS, he enjoys building guitar effects pedals and spending time in the forest with his family.
Josh Hart is a Principal Solutions Architect at Amazon Web Services. He works with ISV customers in the UK to help them build and modernize their SaaS applications on AWS.
Read MoreAWS Database Blog