We are excited to announce the public preview of BigQuery differential privacy, an SQL building block that analysts and data scientists can use to anonymize their data. In the future, we’ll integrate differential privacy with BigQuery data clean rooms to help organizations anonymize and share sensitive data, all while preserving privacy.Â
This launch adds differential privacy to Google SQL for BigQuery, building on the open-source differential privacy library that is used by Ads Data Hub and the COVID-19 Community Mobility Reports. Google’s research in differentially private SQL was published in a 2019 paper and was recognized with the Future of Privacy Forum’s 2021 Award for Research Data Stewardship.Â
We’re also excited to announce our partnership with Tumult Labs, a leader in differential privacy for companies and government agencies. Tumult Labs offers technology and professional services to help Google Cloud customers with their differential privacy implementations. Learn more about how Tumult Labs can help you below.
What is differential privacy?
Differential privacy is an anonymization technique that limits the personal information that is revealed by an output. Differential privacy is commonly used to allow inferences and to share data while preventing someone from learning information about an entity in that dataset.Â
Advertising, financial services, healthcare, and education companies use differential privacy to perform analysis without exposing individual records. Differential privacy is also used by public sector organizations like the U.S. Census and by companies that comply with the General Data Protection Regulation (GDPR), the Health Insurance Portability and Accountability Act (HIPAA), the Family Educational Rights and Privacy Act (FERPA), and the California Consumer Privacy Act (CCPA).
What can I do with BigQuery differential privacy?
With BigQuery differential privacy, you can:
Anonymize results with individual-record privacy
Anonymize results without copying or moving your data, including data from AWS and Azure with BigQuery Omni
Anonymize results that are sent to Dataform pipelines so that they can be consumed by other applications
Anonymize results that are sent to Apache Spark stored procedures
Use additional differential privacy features by calling external frameworks and platforms like PipelineDP.io and Tumult AnalyticsÂ
[Coming soon] Use differential privacy with authorized views and authorized routines
[Coming soon] Share anonymized data with BigQuery Data Clean RoomsÂ
BigQuery differential privacy also works with your existing security controls so you can:
Anonymize results while using row- and column-level security, dynamic data masking, and column-level encryption
Prevent sensitive data from being queried without proper permission using Data profiles for BigQuery data
How do I get started?
Differential privacy is now part of GoogleSQL for BigQuery and is available in all editions and the on-demand pricing model.
You can apply differential privacy to the following aggregate functions to anonymize the results:
SUM
COUNT
AVG
PERCENTILE_CONT
Here is a sample differential privacy query on a BigQuery public dataset that computes the 50th and 90th percentiles of Medicare beneficiaries by provider type. This query anonymizes the percentile results that are calculated using the physician identifier to protect physician privacy.
Note: The parameters in DIFFERENTIAL_PRIVACY OPTIONS in this sample query are not recommendations. You can learn more about how privacy parameters work in the differential privacy clause and can work with your privacy officer or with a Google partner to determine the optimal privacy parameters for your dataset and organization.
How can Tumult Labs help me?
Some uses of differential privacy require features like privacy accounting or variants like zero-concentrated differential privacy. Through our partnership with Tumult Labs, you can ensure that your use of BigQuery differential privacy:
Aligns with compliance and regulatory requirements
Certifies that your use of differential privacy provides end-to-end privacy guarantees
Balances data sharing with privacy risk
Learn more about how Tumult Labs can help you with BigQuery differential privacy here.
Where can I learn more?
Learn more about BigQuery differential privacy at:
Differentially private aggregate functions
The differential privacy clause
Let us know where you need help with BigQuery differential privacy.
Cloud BlogRead More