A summary of data engineering, cloud, and mobile posts that caught the eye.
Geospatial Functions for BigQuery
BigQuery, GIS
Geospatial functions for BigQuery has been placed in beta release, along with a web query result visualisation tool called BigQuery Geo Viz.
Some new public data sets have also been announced to help users with their geospatial analysis. The current data sets are focused on the United States, but the free storage tiers for public data sets provide an incentive for other organizations to share public datasets using BigQuery.
This product video by Google demonstrates an example of using BigQuery GIS to identify suitable retail locations using zip code data.
BigQuery Scheduled Queries
BigQuery
Until now it was necessary to write your own query and output code using a client library of the BigQuery CLI and then schedule this to run using Google Cloud cron jobs (AppEngine), Cloud Composer (Apache Airflow), or local cron jobs jobs that initiate queries using the BigQuery command line.
BigQuery Scheduled Queries now offers a simple way to execute a scheduled query without the complexity of the other solutions.
Notifications of query completions may be posted to Google Cloud PubSub, allowing subsequent processing to be performed by any service that can process PubSub data. For example, the query completion notification might be a trigger for a Cloud Function.
At present the scheduled query must be created using the web interface. Hopefully this is extended to allow programmatic creation for a solid operational solution that allows automated creation of environments.
Serverless Data Aggregation with Firebase
Firestore, Dataflow, Apache Beam, Cloud PubSub, Cloud Functions
Here is a nice serverless architecture for aggregating a high rate of inputs and publishing the results quickly. It makes use of Cloud Firestore, Cloud Functions, Cloud Pub/Sub, and Cloud Dataflow (Apache Beam). It is well suited for use in mobile applications. The example used is a game where thousands of players are producing inputs to the game state simultaneously. As it uses Cloud Firestore, it is straightforward to send input data from iOS or Android apps, websites, or applications written in a variety of languages.
The data pipeline is structured as follows:
- The data sources (e.g. Android and iOS apps) write to a new Firestore document for each data entry. This avoids contention over updating a single document and updates can be written at a rate that is essentially unlimited.
- Firestore provides the ability to trigger a Cloud Functions for each document write. The cloud function will send a message to a Cloud Pub/Sub topic.
- A streaming Cloud Dataflow job then reads each Pub/Sub message, performs the desired aggregation across a time window and then publishes the result to another Pub/Sub topic. The results could also be saved to BigQuery or file on Cloud Storage at this stage.
- A second cloud function is triggered of the Pub/Sub results messages and writes the results to Firestore. The Firestore change listening client functions can then be used to display the changes in the client applications asynchronously as each new aggregation update is published.
Robolectric 4
Android, Robolectric, AndroidX, unit testing, automated testing
Robolectric is an excellent open source tool that simplifies Android automated testing. Robolectric 4 takes that a step further and brings tighter integration with the new repackaged AndroidX test framework. This includes the ability to use the AndroidX test runner to launch a test that uses the Robolectric as well as adopting the Android binary resource processing chain, instead of Robolectric’s own method.
Recent Google Android testing presentations have also discussed Robolectric, assuring its position as a key tool for Android unit testing.