Question 1

Your startup has never implemented a formal security policy. Currently, everyone in the company has access to the datasets stored in Google BigQuery. Teams have freedom to use the service as they see fit, and they have not documented their use cases. You have been asked to secure the data warehouse. You need to discover what everyone is doing. What should you do first?
  • Question 2

    You are designing a cloud-native historical data processing system to meet the following conditions:
    * The data being analyzed is in CSV, Avro, and PDF formats and will be accessed by multiple analysis tools including Cloud Dataproc, BigQuery, and Compute Engine.
    * A streaming data pipeline stores new data daily.
    * Peformance is not a factor in the solution.
    * The solution design should maximize availability.
    How should you design data storage for this solution?
  • Question 3

    You are designing the database schema for a machine learning-based food ordering service that will predict what users want to eat. Here is some of the information you need to store:
    * The user profile: What the user likes and doesn't like to eat
    * The user account information: Name, address, preferred meal times
    * The order information: When orders are made, from where, to whom
    The database will be used to store all the transactional data of the product. You want to optimize the data schema. Which Google Cloud Platform product should you use?
  • Question 4

    Which of the following is NOT one of the three main types of triggers that Dataflow supports?
  • Question 5

    You need to create a data pipeline that copies time-series transaction data so that it can be queried from within BigQuery by your data science team for analysis. Every hour, thousands of transactions are updated with a new status. The size of the intitial dataset is 1.5 PB, and it will grow by 3 TB per day. The data is heavily structured, and your data science team will build machine learning models based on this data. You want to maximize performance and usability for your data science team. Which two strategies should you adopt? (Choose two.)