Question 251

Your software uses a simple JSON format for all messages. These messages are published to Google Cloud Pub/Sub, then processed with Google Cloud Dataflow to create a real-time dashboard for the CFO. During testing, you notice that some messages are missing in the dashboard. You check the logs, and all messages are being published to Cloud Pub/Sub successfully. What should you do next?
  • Question 252

    You're training a model to predict housing prices based on an available dataset with real estate properties.
    Your plan is to train a fully connected neural net, and you've discovered that the dataset contains latitude and longtitude of the property. Real estate professionals have told you that the location of the property is highly influential on price, so you'd like to engineer a feature that incorporates this physical dependency.
    What should you do?
  • Question 253

    Your company's on-premises Apache Hadoop servers are approaching end-of-life, and IT has decided to migrate the cluster to Google Cloud Dataproc. A like-for-like migration of the cluster would require 50 TB of Google Persistent Disk per node. The CIO is concerned about the cost of using that much block storage. You want to minimize the storage cost of the migration. What should you do?
  • Question 254

    You are building a new application that you need to collect data from in a scalable way. Data arrives continuously from the application throughout the day, and you expect to generate approximately 150 GB of JSON data per day by the end of the year. Your requirements are:
    Decoupling producer from consumer
    Space and cost-efficient storage of the raw ingested data, which is to be stored indefinitely Near real-time SQL query Maintain at least 2 years of historical data, which will be queried with SQ Which pipeline should you use to meet these requirements?
  • Question 255

    You need to compose visualization for operations teams with the following requirements:
    Telemetry must include data from all 50,000 installations for the most recent 6 weeks (sampling once every minute)
    The report must not be more than 3 hours delayed from live data.
    The actionable report should only show suboptimal links.
    Most suboptimal links should be sorted to the top.
    Suboptimal links can be grouped and filtered by regional geography.
    User response time to load the report must be <5 seconds.
    You create a data source to store the last 6 weeks of data, and create visualizations that allow viewers to see multiple date ranges, distinct geographic regions, and unique installation types. You always show the latest data without any changes to your visualizations. You want to avoid creating and updating new visualizations each month. What should you do?