Question 16
You are using Google BigQuery as your data warehouse. Your users report that the following simple query is running very slowly, no matter when they run the query:
SELECT country, state, city FROM [myproject:mydataset.mytable] GROUP BY country You check the query plan for the query and see the following output in the Read section of Stage:1:

What is the most likely cause of the delay for this query?
SELECT country, state, city FROM [myproject:mydataset.mytable] GROUP BY country You check the query plan for the query and see the following output in the Read section of Stage:1:

What is the most likely cause of the delay for this query?
Question 17
Which of these is NOT a way to customize the software on Dataproc cluster instances?
Question 18
You use BigQuery as your centralized analytics platform. New data is loaded every day, and an ETL pipeline modifies the original data and prepares it for the final users. This ETL pipeline is regularly modified and can generate errors, but sometimes the errors are detected only after 2 weeks. You need to provide a method to recover from these errors, and your backups should be optimized for storage costs. How should you organize your data in BigQuery and store your backups?
Question 19
Government regulations in your industry mandate that you have to maintain an auditable record of access
to certain types of data. Assuming that all expiring logs will be archived correctly, where should you store
data that is subject to that mandate?
to certain types of data. Assuming that all expiring logs will be archived correctly, where should you store
data that is subject to that mandate?
Question 20
You are planning to migrate your current on-premises Apache Hadoop deployment to the cloud. You need to ensure that the deployment is as fault-tolerant and cost-effective as possible for long-running batch jobs.
You want to use a managed service. What should you do?
You want to use a managed service. What should you do?
