Question 61

Your team is building a data engineering and data science development environment.
The environment must support the following requirements:
support Python and Scala
compose data storage, movement, and processing services into automated data pipelines the same tool should be used for the orchestration of both data engineering and data science support workload isolation and interactive workloads enable scaling across a cluster of machines You need to create the environment.
What should you do?
  • Question 62

    A coworker registers a datastore in a Machine Learning services workspace by using the following code:

    You need to write code to access the datastore from a notebook.

    Question 63

    You use Data Science Virtual Machines (DSVMs) for Windows and Linux in Azure.
    You need to access the DSVMs.
    Which utilities should you use? To answer, select the appropriate options in the answer area.
    NOTE: Each correct selection is worth one point.

    Question 64

    Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.
    After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
    You are analyzing a numerical dataset which contains missing values in several columns.
    You must clean the missing values using an appropriate operation without affecting the dimensionality of the feature set.
    You need to analyze a full dataset to include all values.
    Solution: Replace each missing value using the Multiple Imputation by Chained Equations (MICE) method.
    Does the solution meet the goal?
  • Question 65

    You create an experiment in Azure Machine Learning Studio. You add a training dataset that contains 10,000 rows. The first 9,000 rows represent class 0 (90 percent).
    The remaining 1,000 rows represent class 1 (10 percent).
    The training set is imbalances between two classes. You must increase the number of training examples for class 1 to 4,000 by using 5 data rows. You add the Synthetic Minority Oversampling Technique (SMOTE) module to the experiment.
    You need to configure the module.
    Which values should you use? To answer, select the appropriate options in the dialog box in the answer area.
    NOTE: Each correct selection is worth one point.