Question 36

You designed a database for patient records as a pilot project to cover a few hundred patients in three clinics.
Your design used a single database table to represent all patients and their visits, and you used self-joins to generate reports. The server resource utilization was at 50%. Since then, the scope of the project has expanded. The database must now store 100 times more patient records. You can no longer run the reports, because they either take too long or they encounter errors with insufficient compute resources. How should you adjust the database design?
  • Question 37

    Your company's customer and order databases are often under heavy load. This makes performing analytics against them difficult without harming operations. The databases are in a MySQL cluster, with nightly backups taken using mysqldump. You want to perform analytics with minimal impact on operations. What should you do?
  • Question 38

    Your company is performing data preprocessing for a learning algorithm in Google Cloud Dataflow.
    Numerous data logs are being are being generated during this step, and the team wants to analyze them.
    Due to the dynamic nature of the campaign, the data is growing exponentially every hour.
    The data scientists have written the following code to read the data for a new key features in the logs.
    BigQueryIO.Read
    .named("ReadLogData")
    .from("clouddataflow-readonly:samples.log_data")
    You want to improve the performance of this data read. What should you do?
  • Question 39

    You are choosing a NoSQL database to handle telemetry data submitted from millions of Internet-of-
    Things (IoT) devices. The volume of data is growing at 100 TB per year, and each data entry has about
    100 attributes. The data processing pipeline does not require atomicity, consistency, isolation, and
    durability (ACID). However, high availability and low latency are required.
    You need to analyze the data by querying against individual fields. Which three databases meet your
    requirements? (Choose three.)
  • Question 40

    You are building a new data pipeline to share data between two different types of applications: jobs generators and job runners. Your solution must scale to accommodate increases in usage and must accommodate the addition of new applications without negatively affecting the performance of existing ones. What should you do?