Question 196
You have a query that filters a BigQuery table using a WHERE clause on timestamp and ID columns. By using bq query -dry_run you learn that the query triggers a full scan of the table, even though the filter on timestamp and ID select a tiny fraction of the overall data. You want to reduce the amount of data scanned by BigQuery with minimal changes to existing SQL queries. What should you do?
Question 197
Which of the following is not true about Dataflow pipelines?
Question 198
You work for a manufacturing plant that batches application log files together into a single log file once a
day at 2:00 AM. You have written a Google Cloud Dataflow job to process that log file. You need to make
sure the log file in processed once per day as inexpensively as possible. What should you do?
day at 2:00 AM. You have written a Google Cloud Dataflow job to process that log file. You need to make
sure the log file in processed once per day as inexpensively as possible. What should you do?
Question 199
You work for a global shipping company. You want to train a model on 40 TB of data to predict which ships in each geographic region are likely to cause delivery delays on any given day. The model will be based on multiple attributes collected from multiple sources. Telemetry data, including location in GeoJSON format, will be pulled from each ship and loaded every hour. You want to have a dashboard that shows how many and which ships are likely to cause delays within a region. You want to use a storage solution that has native functionality for prediction and geospatial processing. Which storage solution should you use?
Question 200
Flowlogistic is rolling out their real-time inventory tracking system. The tracking devices will all send package-tracking messages, which will now go to a single Google Cloud Pub/Sub topic instead of the Apache Kafka cluster. A subscriber application will then process the messages for real-time reporting and store them in Google BigQuery for historical analysis. You want to ensure the package data can be analyzed over time.
Which approach should you take?
Which approach should you take?