Question 76

What are two of the characteristics of using online prediction rather than batch prediction?
  • Question 77

    Why do you need to split a machine learning dataset into training data and test data?
  • Question 78

    You're training a model to predict housing prices based on an available dataset with real estate properties.
    Your plan is to train a fully connected neural net, and you've discovered that the dataset contains latitude and longtitude of the property. Real estate professionals have told you that the location of the property is highly influential on price, so you'd like to engineer a feature that incorporates this physical dependency.
    What should you do?
  • Question 79

    Your company is performing data preprocessing for a learning algorithm in Google Cloud Dataflow. Numerous data logs are being are being generated during this step, and the team wants to analyze them. Due to the dynamic nature of the campaign, the data is growing exponentially every hour.
    The data scientists have written the following code to read the data for a new key features in the logs.
    BigQueryIO.Read
    .named("ReadLogData")
    .from("clouddataflow-readonly:samples.log_data")
    You want to improve the performance of this data read. What should you do?
  • Question 80

    You want to analyze hundreds of thousands of social media posts daily at the lowest cost and with the fewest steps.
    You have the following requirements:
    * You will batch-load the posts once per day and run them through the Cloud Natural Language API.
    * You will extract topics and sentiment from the posts.
    * You must store the raw posts for archiving and reprocessing.
    * You will create dashboards to be shared with people both inside and outside your organization.
    You need to store both the data extracted from the API to perform analysis as well as the raw social media posts for historical archiving. What should you do?