Mistakes are part of human nature. Who hasn’t left their car unlocked or accidentally hit “reply all” on an email intended to be private? But making mistakes in your enterprise data warehouse, such as accidentally deleting or modifying data, can have a major impact on your business.
BigQuery time travel, which is automatically enabled for all datasets, lets you quickly access the state of a table as of any point in time within the last 7 days. However, recovering tables using this feature can be tricky as you need to keep track of the “last known good” time. Also, you may want to maintain the state of your data beyond the 7 day window, for example, for auditing or regulatory compliance requirements. This is where the new BigQuery table snapshots feature comes into play.
Table snapshots are available via the BigQuery API, SQL, command line interface, or the Google Cloud Console. Let’s look at a quick example in the Cloud Console.
First, we’ll create a new dataset and table to test out the snapshot functionality:
Next, open to the properties page for the newly created table by selecting it in the Explorer pane. The source table for a snapshot is called the base table.
While you can use SQL or the BigQuery command line tool to create a snapshot, for this example we’ll create a snapshot of the inventory table using the Snapshot button in the Cloud Console toolbar.
BigQuery has introduced a new IAM permission (bigquery.tables.createSnapshot) that is required for the base table in addition to the existing bigquery.tables.get and bigquery.tables.getData permissions. This new permission has been added to the bigQuery.dataViewer and bigQuery.dataEditor roles, but will need to be added to any custom roles that you have created.
Table snapshots are treated just like regular tables, except you can’t make any modifications (either data or schema) to them. If you create a snapshot in the same dataset as the base table, you will need to give it a unique name or use the suggested name which appends a timestamp on to the end of the table name.
If you want to use the original table name as the snapshot name, you will need to create it in a new dataset so there will be no naming conflicts. For example, you could write a script to create a new dataset and create snapshots of all of the tables from a source dataset, preserving their original names. Note that when you create a snapshot in another dataset, it will inherit the security configuration of the destination dataset, not the source.
You can optionally enter a value into the Expiration time field and have BigQuery automatically delete the snapshot at that point in time. You can also optionally specify a value in the Snapshot time field to create the snapshot from a historical version of the base table within the time travel window. For example, you could create a snapshot from the state of a base table as of 3 hours ago.
For this example, I’ll use the name inventory-snapshot. A few seconds after I click Save, the snapshot is created. It will appear in the list of tables in the Explorer pane with a different icon.
The equivalent SQL for this operation would be:
Now, let’s take a look at the properties page for the new table snapshot in the Cloud Console.
In addition to the general snapshot table information, you see information about the base table that was used to create the snapshot, as well as the date and time that the snapshot was created. This will be true even if the base table is deleted. Although the snapshot size displays the size of the full table, you will only be billed (using standard BigQuery pricing) for the difference in size between the data maintained in the snapshot and what is currently maintained in the base table. If no data is removed, or changed in the base table, there will be no additional charge for the snapshot.
As a snapshot is read-only, If you attempt to modify the snapshot table data via DML or change the schema of the snapshot via DDL, you will get an error. However, you can change snapshot properties such as description, expiration time, or labels. You can also use table access controls to change who has access to the snapshot, just like any other table.
Let’s say we accidentally deleted some data from the base table. You can simulate this by running the following commands in the SQL workspace.
You will see that the base table now has only 6 rows, while the number of rows and size of the snapshot has not changed. If you need to access the deleted data, you can query the snapshot directly. For example, the following query will show you that the snapshot still has 7 rows:
However, if you want to update the data in a snapshot, you will need to restore it to a writable table. To do this, click the Restore button in the Cloud Console.
By default, the snapshot will be restored into a new table. However, if you would like to replace an existing table, you can use the existing table name and select the Overwrite table if it exists checkbox.
This operation can also be performed with the BigQuery API, SQL, or CLI. The equivalent SQL for this operation would be:
In this blog, we’ve demonstrated how to use the Google Cloud Console and the new table snapshots feature to easily create backups of your BigQuery tables. You can also create periodic (daily, monthly, etc.) snapshots of tables using the BigQuery scheduled query functionality. Learn more about table snapshots in the BigQuery documentation.
Cloud BlogRead More