Question 131

Your infrastructure includes a set of YouTube channels. You have been tasked with creating a process for
sending the YouTube channel data to Google Cloud for analysis. You want to design a solution that allows
your world-wide marketing teams to perform ANSI SQL and other types of analysis on up-to-date YouTube
channels log data. How should you set up the log data transfer into Google Cloud?
  • Question 132

    Your analytics team wants to build a simple statistical model to determine which customers are most likely to work with your company again, based on a few different metrics. They want to run the model on Apache Spark, using data housed in Google Cloud Storage, and you have recommended using Google Cloud Dataproc to execute this job. Testing has shown that this workload can run in approximately 30 minutes on a 15-node cluster, outputting the results into Google BigQuery. The plan is to run this workload weekly. How should you optimize the cluster for cost?
  • Question 133

    Your company uses a proprietary system to send inventory data every 6 hours to a data ingestion service in the cloud. Transmitted data includes a payload of several fields and the timestamp of the transmission. If there are any concerns about a transmission, the system re-transmits the data. How should you deduplicate the data most efficiency?
  • Question 134

    Your neural network model is taking days to train. You want to increase the training speed. What can you do?
  • Question 135

    The _________ for Cloud Bigtable makes it possible to use Cloud Bigtable in a Cloud Dataflow pipeline.