Question 16

You are using Google BigQuery as your data warehouse. Your users report that the following simple query is running very slowly, no matter when they run the query:
SELECT country, state, city FROM [myproject:mydataset.mytable] GROUP BY country You check the query plan for the query and see the following output in the Read section of Stage:1:

What is the most likely cause of the delay for this query?
  • Question 17

    Which of these is NOT a way to customize the software on Dataproc cluster instances?
  • Question 18

    You use BigQuery as your centralized analytics platform. New data is loaded every day, and an ETL pipeline modifies the original data and prepares it for the final users. This ETL pipeline is regularly modified and can generate errors, but sometimes the errors are detected only after 2 weeks. You need to provide a method to recover from these errors, and your backups should be optimized for storage costs. How should you organize your data in BigQuery and store your backups?
  • Question 19

    Government regulations in your industry mandate that you have to maintain an auditable record of access
    to certain types of data. Assuming that all expiring logs will be archived correctly, where should you store
    data that is subject to that mandate?
  • Question 20

    You are planning to migrate your current on-premises Apache Hadoop deployment to the cloud. You need to ensure that the deployment is as fault-tolerant and cost-effective as possible for long-running batch jobs.
    You want to use a managed service. What should you do?