Question 51
Your analytics team wants to build a simple statistical model to determine which customers are most likely
to work with your company again, based on a few different metrics. They want to run the model on Apache
Spark, using data housed in Google Cloud Storage, and you have recommended using Google Cloud
Dataproc to execute this job. Testing has shown that this workload can run in approximately 30 minutes on
a 15-node cluster, outputting the results into Google BigQuery. The plan is to run this workload weekly.
How should you optimize the cluster for cost?
to work with your company again, based on a few different metrics. They want to run the model on Apache
Spark, using data housed in Google Cloud Storage, and you have recommended using Google Cloud
Dataproc to execute this job. Testing has shown that this workload can run in approximately 30 minutes on
a 15-node cluster, outputting the results into Google BigQuery. The plan is to run this workload weekly.
How should you optimize the cluster for cost?
Question 52
Your company is in a highly regulated industry. One of your requirements is to ensure individual users have access only to the minimum amount of information required to do their jobs. You want to enforce this requirement with Google BigQuery.
Which three approaches can you take? (Choose three.)
Which three approaches can you take? (Choose three.)
Question 53
What are all of the BigQuery operations that Google charges for?
Question 54
You are designing a Dataflow pipeline for a batch processing job. You want to mitigate multiple zonal failures at job submission time. What should you do?
Question 55
What are two of the benefits of using denormalized data structures in BigQuery?
