Question 51

Your analytics team wants to build a simple statistical model to determine which customers are most likely
to work with your company again, based on a few different metrics. They want to run the model on Apache
Spark, using data housed in Google Cloud Storage, and you have recommended using Google Cloud
Dataproc to execute this job. Testing has shown that this workload can run in approximately 30 minutes on
a 15-node cluster, outputting the results into Google BigQuery. The plan is to run this workload weekly.
How should you optimize the cluster for cost?
  • Question 52

    Your company is in a highly regulated industry. One of your requirements is to ensure individual users have access only to the minimum amount of information required to do their jobs. You want to enforce this requirement with Google BigQuery.
    Which three approaches can you take? (Choose three.)
  • Question 53

    What are all of the BigQuery operations that Google charges for?
  • Question 54

    You are designing a Dataflow pipeline for a batch processing job. You want to mitigate multiple zonal failures at job submission time. What should you do?
  • Question 55

    What are two of the benefits of using denormalized data structures in BigQuery?