QuillBot, a Chicago-based company founded in 2017, is a natural language processing (NLP) company recently acquired bycoursehero.com. QuillBot’s platform of tools leverages state-of-the-art NLP to assist users with paraphrasing, summarization, grammar checking, and citation generation. With over ten million monthly active users around the world, QuillBot continues to improve writers’ efficiency and productivity on a global scale.
QuillBot’s differentiation is its broad range of writing services compared to its competitors. Besides grammar and spelling checks, QuillBot offers additional services such as paraphrasing text while retaining its meaning, with modifications such as changing the tone from formal to informal or making the text longer or shorter. In addition, a panoply of synergistic tools provides the value of completing workflows end-to-end.
Google Cloud has been the infrastructure partner for QuillBot from the start. Their business interests converged as QuillBot aspires to constantly push the envelope of state-of-art natural language processing and make computing demands that test hardware’s limits. As a result, it was drawn to Google’s roadmap for the future development of hardware to support larger AI models.
SADA, a Google Cloud premier partner and 2020 North America Partner of the Year, provided key insights to QuillBot on Google Cloud’s advanced GPU instance types and how to optimize their complex machine learning (ML) workload. QuillBot also leverages SADA’s consulting and Technical Account Management services to roadmap new releases and effectively scale for growth.
Preparing for huge growth in capacity without huge increase in costs
QuillBot experiments with new artificial intelligence models that scale exponentially; traffic spikes, in an instant, potentially choking the infrastructure. It would have been overly expensive to build spare capacity in-house or even purchase on-demand computing capacity for peak demand.
Instead, QuillBot needed the flexibility to deploy and pay for infrastructure proportionate to usage without changing purchase plans mid-course. “From an economics perspective, we needed our cloud computing partner to have the hardware capacity to scale as much as 100X and remain stable without a proportionate increase in costs,” Rohan Gupta, CEO and Co-founder of QuillBot, stated.
QuillBot’s entrepreneurial staff needed to make ML easy to understand and execute. At the time, they were not using Terraform, nor did they have a DevOps professional to simplify and automate model development and testing processes. “Our priority was to keep the approach simple and avoid downtime when we upgraded our models to ensure a seamless deployment. Our past efforts to migrate, including our multi-cloud deployment as an example, were fraught with the risk of a painful transition,” David Silin, Co-founder and Chief Science Officer at QuillBot, revealed.
Google Cloud Compute Engine solutions support scalability
Google Cloud uses the latest hardware to scale to millions of users and meet the computing needs of the state-of-art AI models that QuillBot builds and deploys. In addition, it has sufficient redundant capacity to distribute computing loads when traffic spikes.
“Google Cloud’s user interface blew us away. It was unbelievably superior to other cloud providers. We used virtual machine (VM) instance groups and easily distributed load across them with Pub/Sub,” David Silin gushed.
QuillBot uses Google Cloud’s A2 VMs with NVIDIA A100 Tensor Core GPUs for training models and inference and N1 VMs with NVIDIA T4 Tensor Core GPUs on Google Cloud for serving. With A2 VMs, Google remains the only public cloud provider to offer up to 16 NVIDIA A100 GPUs in a single VM , making it possible to train the largest AI models used for high-performance computing. In addition, users can start withone NVIDIA A100 GPU and scale to 16 GPUs without configuring multiple VMs for a single-node ML training. Effective performance of up to 10 petaflops of FP16 or 20 petaOps of int8 in a single VM, when using the NVIDIA A100 GPUs with the sparsity feature. Seamless scaling becomes possible with containerized, pre-configured software to shorten the lead time for running on Compute Engine A100 instances.
Google Cloud also provides a choice of N1 VMs with NVIDIAT4 Tensor Cores, with varying sizing and pricing plans, to help control the cost. NVIDIA T4 GPUs have advanced networking with up to 100 Gbps. In addition, T4 GPUs have a worldwide footprint, and users can choose capacities in individual regions based on their market size. As a result, they have the flexibility to serve demand incrementally as it grows, with smaller and cheaper GPUs and install more than one in areas where T4 GPUs are available in proximity for stability while keeping latencies low.
QuillBot implements best practices when rolling out GPUs
Users need to consider the trade-off between going directly to 16 NVIDIA GPUs or starting small and growing incrementally. “For the bigger models, it makes sense to go straight to 16. To be sure, it is not always easy to figure out how to optimize for that level of scaling,” David Silin cautioned. “We experimented and learned that 16 works best for our core models.”
Similarly, Silin noted, “Serving and distributing preemptible VMs across regions and in production was not something we did immediately.” QuillBot leverages preemptible VMs primarily for their unit economics. Given they are preemptible and subject to being shut down in a given region if capacity is full, distributing them across regions allows the company to diversify and prevent all preemptibles from going down at once.
Silin and team have been able to use Kubernetes Engine to manage their NVIDIA GPUs on Google Cloud for model training and serving. This lightens the load of managing their platform, gives time back to their engineers and helps recognize cost savings from gained efficiencies.
Scaling with Google Cloud is easy and saves on costs
QuillBot found the trade-offs of scaling with Google Cloud to be favorable, with their downsides less costly compared to the upside of the benefits. “With Google Cloud, we can scale 4X and maintain the customer experience knowing that enough spare capacity is available without increasing costs disproportionately. As a result, we are comfortable trying larger models,” Rohan Gupta surmised.
Over-provisioning capacity does not increase unit costs because of an integrated training and deployment stack on the Google Cloud. Additionally, the time-to-market is shorter.
“We can scale with ease with Google Cloud because the unit cost increase of GPU is lower than the speed of scaling as they increase from one to sixteen. The gains in the rate of training are 3X faster compared with only 2X higher unit cost,” David Silin reported. “We grabbed NVIDIA A100 GPUs with 40 Gigabytes of memory as soon as we could, and we can’t wait for what’s next for Google Cloud GPU offering,” he added.
The twin benefits of scaling with relatively low downsides have proved to be an overwhelming advantage for QuillBot over its competitors. As a result, QuillBot has experienced a hockey-stick pattern of growth which it expects to maintain in the future. “We could afford freemium services to acquire customers very rapidly, because our unit costs are low. Both the A2 VM family and NVIDIA T4 GPUs on Google Cloud contribute to our business growth. A2 VMs enable us to build state-of-the-art technology in-house,” Rohan Gupta explained.
QuillBot looks ahead to super-scaling
With the success that QuillBot experienced so far using Google Cloud, the company is planning its future growth on Google’s competitive hardware. “Provisioning our clusters and scaling them is a big priority over the next three months; we will calibrate based on the traffic on our site,” Rohan Gupta revealed.
“Our efficiencies will improve because we have a DevOps person on board. We expect cost savings from predictive auto-scaling implemented through Managed Instance Groups. We are also encouraged by our tests of Google Cloud against competitors that show it has better cold start times than other clouds we tested—a key consideration for super-scaling,” David Silin said.
Cloud BlogRead More