Question 1

You have a self-hosted integration runtime in Azure Data Factory.
The current status of the integration runtime has the following configurations:
Status: Running
Type: Self-Hosted
Running / Registered Node(s): 1/1
High Availability Enabled: False
Linked Count: 0
Queue Length: 0
Average Queue Duration. 0.00s
The integration runtime has the following node details:
Name: X-M
Status: Running
Available Memory: 7697MB
CPU Utilization: 6%
Network (In/Out): 1.21KBps/0.83KBps
Concurrent Jobs (Running/Limit): 2/14
Role: Dispatcher/Worker
Credential Status: In Sync
Use the drop-down menus to select the answer choice that completes each statement based on the information presented.
NOTE: Each correct selection is worth one point.

Question 2

You use Azure Stream Analytics to receive Twitter data from Azure Event Hubs and to output the data to an Azure Blob storage account.
You need to output the count of tweets during the last five minutes every five minutes. Each tweet must only be counted once.
Which windowing function should you use?
  • Question 3

    Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.
    After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
    You plan to create an Azure Databricks workspace that has a tiered structure. The workspace will contain the following three workloads:
    A workload for data engineers who will use Python and SQL.
    A workload for jobs that will run notebooks that use Python, Scala, and SOL.
    A workload that data scientists will use to perform ad hoc analysis in Scala and R.
    The enterprise architecture team at your company identifies the following standards for Databricks environments:
    The data engineers must share a cluster.
    The job cluster will be managed by using a request process whereby data scientists and data engineers provide packaged notebooks for deployment to the cluster.
    All the data scientists must be assigned their own cluster that terminates automatically after 120 minutes of inactivity. Currently, there are three data scientists.
    You need to create the Databricks clusters for the workloads.
    Solution: You create a High Concurrency cluster for each data scientist, a High Concurrency cluster for the data engineers, and a Standard cluster for the jobs.
    Does this meet the goal?
  • Question 4

    Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.
    After you answer a question in this scenario, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
    You have an Azure Storage account that contains 100 GB of files. The files contain text and numerical values. 75% of the rows contain description data that has an average length of 1.1 MB.
    You plan to copy the data from the storage account to an enterprise data warehouse in Azure Synapse Analytics.
    You need to prepare the files to ensure that the data copies quickly.
    Solution: You convert the files to compressed delimited text files.
    Does this meet the goal?
  • Question 5

    You have an Azure Data Lake Storage Gen2 container.
    Data is ingested into the container, and then transformed by a data integration application. The data is NOT modified after that. Users can read files in the container but cannot modify the files.
    You need to design a data archiving solution that meets the following requirements:
    New data is accessed frequently and must be available as quickly as possible.
    Data that is older than five years is accessed infrequently but must be available within one second when requested.
    Data that is older than seven years is NOT accessed. After seven years, the data must be persisted at the lowest cost possible.
    Costs must be minimized while maintaining the required availability.
    How should you manage the data? To answer, select the appropriate options in the answer are a.
    NOTE: Each correct selection is worth one point