Editor’s note: Today, we’re hearing from data engineering and operations company Crux Informatics about how BigQuery helps them achieve rock-solid, cost-effective data delivery.
At Crux Informatics, our mission is to get data flowing by removing obstacles in the delivery and ingestion of data at scale. We want to remove any friction across the data supply chain that stops companies from getting the most value out of data, so they can make smarter business decisions.
But as you may know, if you’re in the business of data, this industry never stands still. It’s constantly evolving and changing. Data at rest is a bit of a misnomer in our book—the data might be at rest, but the computation around it never is. It’s therefore critical that we have solid, scalable, and cost-effective infrastructure. Without it, we would never make it off the starting block.
That’s why when it came to building a centralized large-scale data cloud, we needed to invest in a solution that would not only suit our current data storage needs but also enable us to tackle what’s coming, supporting a massive ecosystem of data delivery and operations for thousands of companies.
Low-cost ingestion with uncompromised performance
At Crux, we use BigQuery as our data warehouse. An interesting note about our journey to BigQuery is that, unlike many other organizations, we weren’t looking for a new solution to modernize our data analytics platform. We already had a cloud-based modern infrastructure composed of Snowflake on AWS that had worked well for us. We had previously partnered up with Google to deliver high-performance data access and data processing to our customers for our data lake. We weren’t in a situation where we were under pressure to move because our legacy solutions were ill-suited or outdated for today’s fast-paced data analytics requirements.
Though there are many additional advantages to choosing BigQuery, arguably the most significant factor was the pricing model. We don’t use technology because it’s cost-effective— rather, it has to be cost-effective in the way that we use it.
The volume of data that we consume, the number of suppliers that we add each day, the quantity of data users we support, and the increasing demand for sharing data mean that our data consumption is astronomical. But Crux doesn’t generate revenue from consuming data, our business value is in the description, delivery, validation, transformation and distribution of data. Therefore, we had to have a core system where the cost of ingestion was close to zero.
There is no charge for loading data into BigQuery, which matches our own commercial model extremely well. Of course, we have to pay for storage, but not having to worry about computation costs on top to load data into BigQuery allows us to build a very solid foundation at a competitive price point that is not easily matched by other large-scale data warehouse solutions.
When you combine low-to-no-cost high-speed ingestion with the ability to store all of our data in multiple formats in one place, it offers some very tangible benefits and cost savings.
Platform integrations create a better data experience
Beyond pricing, choosing BigQuery provided several other advantages to us, including the ability to integrate and access other Google Cloud Platform (GCP) tools and services. For instance, we can set up connections to external data sources that live in the larger GCP ecosystem and run federated queries or create external data tables.
Besides static and semi-static reference data, we also work with real-time streaming data. For instance, we wanted to ingest live market prices and stream them into BigQuery. This isn’t necessarily for immediate data analysis. We see a high demand for capturing data in real time so that it can be used later for historical analysis or, indeed neartime immediate use.
We also leverage solutions like Dataflow, which gives us a lot of extra value without the extra effort. Using Dataflow with BigQuery for streaming data processing made the whole integration process seamless.
From the perspective of our end data users, BigQuery’s integrations also help us increase overall customer satisfaction. We have the ability to easily transfer data anywhere they need it, including Google Cloud Storage, Amazon, Azure, and more. Our customers can also integrate their favorite business intelligence tools without additional overhead to their own processes. One of our most common use cases is the ability to create instant ODBC connections to BigQuery. This has huge implications for our ability to deliver a better overall experience to the people using our data.
In addition, BigQuery makes it easy to handle cross-region replication without any additional costs. BigQuery really is a cloud, rather than an application running on a cloud. Instead of having to organize database replication, paying additional storage and egress fees, we can globally host our datasets at a fraction of our previous costs.
The right customer support makes all the difference
The final crucial differentiator in our decision was the Google team itself. Google’s support is exceptional—and this is not always the case with other providers. It’s rare to have such a close relationship where you have the ability to report an issue and know that it will end up on the desk where it needs to be.
The support team is always quick to respond and help us keep everything running smoothly. And to be honest, while cost may have been the primary concern when we chose BigQuery—it is the support team that has solidified the relationship.
In many ways, flexibility, reliability, and scalability are almost table stakes for many cloud providers. But this level of caring and due attention from support is certainly not everywhere. It is and will continue to be, a key component to our success with BigQuery.
It’s the real cloud data deal
Rarely do organizations ever want one type of data access, and it’s rare that there is a consolidated approach to data even under one roof. Google Cloud and BigQuery are helping us provide our customers with all the benefits of a data cloud, including fully managed global scale, load management, high performance, and genuinely good support. We put data in and data comes back out, and that’s really the experience we want for our suppliers, our data users, and ourselves. This foundation enables our hub to meet our customers needs, delivering data from any source, validated and transformed as they want it, to the destination they want.
Cloud BlogRead More