March Roundup

Curated summary of interesting articles from the world of mobile and data

Posted by DataFormation on March 31, 2019

BigQuery Storage API Promoted to Beta

Google Cloud, BigQuery, Apache Beam

The BigQuery Storage API unifies the data warehouse and storage by providing a fast export of data using multiple streams that may be consumed by a multithreaded or distributed recipient. It is much faster than the existing alternatives of bulk export to file or paginated access using the BigQuery client APIs.

The key features are:

  • Multiple streams for fast read access.
  • A subset of columns may be selected to be read, further increasing performance.
  • Simple filtering of data to be read before sending to the client.
  • Snapshot consistency—data is read from a specific point in time.

In addition, the Apache Beam BigQueryIO connector (starting with version 2.11) supports reading using the BigQuery Storage API. This is a good example of how the BigQuery Storage API can work in combination with a distributed processing framework.

This video from Google Next provides more information about the storage API and demonstrates the performance benefits.

Apache Beam SQL

Apache Beam, Beam SQL

A review of the new Apache Beam SQL feature that allows SQL to be injected in the pipeline.