BigQuery Storage API Promoted to Beta
Google Cloud, BigQuery, Apache Beam
The BigQuery Storage API unifies the data warehouse and storage by providing a fast export of data using multiple streams that may be consumed by a multithreaded or distributed recipient. It is much faster than the existing alternatives of bulk export to file or paginated access using the BigQuery client APIs.
The key features are:
- Multiple streams for fast read access.
- A subset of columns may be selected to be read, further increasing performance.
- Simple filtering of data to be read before sending to the client.
- Snapshot consistency—data is read from a specific point in time.
In addition, the Apache Beam BigQueryIO connector (starting with version 2.11) supports reading using the BigQuery Storage API. This is a good example of how the BigQuery Storage API can work in combination with a distributed processing framework.
This video from Google Next provides more information about the storage API and demonstrates the performance benefits.
Apache Beam SQL
Apache Beam, Beam SQL
A review of the new Apache Beam SQL feature that allows SQL to be injected in the pipeline.