Question 41

What should you recommend to prevent users outside the Litware on-premises network from accessing the analytical data store?
  • Question 42

    You are designing a fact table named FactPurchase in an Azure Synapse Analytics dedicated SQL pool. The table contains purchases from suppliers for a retail store. FactPurchase will contain the following columns.

    FactPurchase will have 1 million rows of data added daily and will contain three years of data.
    Transact-SQL queries similar to the following query will be executed daily.
    SELECT
    SupplierKey, StockItemKey, COUNT(*)
    FROM FactPurchase
    WHERE DateKey >= 20210101
    AND DateKey <= 20210131
    GROUP By SupplierKey, StockItemKey
    Which table distribution will minimize query times?
  • Question 43

    You have an Apache Spark DataFrame named temperatures. A sample of the data is shown in the following table.

    You need to produce the following table by using a Spark SQL query.

    How should you complete the query? To answer, drag the appropriate values to the correct targets. Each value may be used once more than once, or not at all. You may need to drag the split bar between panes or scroll to view content.
    NOTE: Each correct selection is worth one point.

    Question 44

    You have an Azure Data Lake Storage Gen2 account that contains a JSON file for customers. The file contains two attributes named FirstName and LastName.
    You need to copy the data from the JSON file to an Azure Synapse Analytics table by using Azure Databricks. A new column must be created that concatenates the FirstName and LastName values.
    You create the following components:
    A destination table in Azure Synapse
    An Azure Blob storage container
    A service principal
    Which five actions should you perform in sequence next in is Databricks notebook? To answer, move the appropriate actions from the list of actions to the answer area and arrange them in the correct order.

    Question 45

    You plan to build a structured streaming solution in Azure Databricks. The solution will count new events in five-minute intervals and report only events that arrive during the interval. The output will be sent to a Delta Lake table.
    Which output mode should you use?