Question 86

You are designing a statistical analysis solution that will use custom proprietary1 Python functions on near real-time data from Azure Event Hubs.
You need to recommend which Azure service to use to perform the statistical analysis. The solution must minimize latency.
What should you recommend?
  • Question 87

    You are designing a solution that will copy Parquet files stored in an Azure Blob storage account to an Azure Data Lake Storage Gen2 account.
    The data will be loaded daily to the data lake and will use a folder structure of {Year}/{Month}/{Day}/.
    You need to design a daily Azure Data Factory data load to minimize the data transfer between the two accounts.
    Which two configurations should you include in the design? Each correct answer presents part of the solution.
    NOTE: Each correct selection is worth one point.
  • Question 88

    Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.
    After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
    You plan to create an Azure Databricks workspace that has a tiered structure. The workspace will contain the following three workloads:
    A workload for data engineers who will use Python and SQL.
    A workload for jobs that will run notebooks that use Python, Scala, and SOL.
    A workload that data scientists will use to perform ad hoc analysis in Scala and R.
    The enterprise architecture team at your company identifies the following standards for Databricks environments:
    The data engineers must share a cluster.
    The job cluster will be managed by using a request process whereby data scientists and data engineers provide packaged notebooks for deployment to the cluster.
    All the data scientists must be assigned their own cluster that terminates automatically after 120 minutes of inactivity. Currently, there are three data scientists.
    You need to create the Databricks clusters for the workloads.
    Solution: You create a Standard cluster for each data scientist, a High Concurrency cluster for the data engineers, and a High Concurrency cluster for the jobs.
    Does this meet the goal?
  • Question 89

    You have an Azure data factory.
    You need to examine the pipeline failures from the last 60 days.
    What should you use?
  • Question 90

    You are designing a date dimension table in an Azure Synapse Analytics dedicated SQL pool. The date dimension table will be used by all the fact tables.
    Which distribution type should you recommend to minimize data movement?