Question 96

Which statement characterizes the general programming model used by Spark Structured Streaming?
  • Question 97

    A DLT pipeline includes the following streaming tables:
    Raw_lot ingest raw device measurement data from a heart rate tracking device.
    Bgm_stats incrementally computes user statistics based on BPM measurements from raw_lot.
    How can the data engineer configure this pipeline to be able to retain manually deleted or updated records in the raw_iot table while recomputing the downstream table when a pipeline update is run?
  • Question 98

    The data engineering team maintains the following code:

    Assuming that this code produces logically correct results and the data in the source tables has been de-duplicated and validated, which statement describes what will occur when this code is executed?
  • Question 99

    A junior data engineer is working to implement logic for a Lakehouse table named silver_device_recordings.
    The source data contains 100 unique fields in a highly nested JSON structure.
    The silver_device_recordings table will be used downstream for highly selective joins on a number of fields, and will also be leveraged by the machine learning team to filter on a handful of relevant fields, in total, 15 fields have been identified that will often be used for filter and join logic.
    The data engineer is trying to determine the best approach for dealing with these nested fields before declaring the table schema.
    Which of the following accurately presents information about Delta Lake and Databricks that may Impact their decision-making process?
  • Question 100

    Data engineering team has provided 10 queries and asked Data Analyst team to build a dashboard and refresh the data every day at 8 AM, identify the best approach to set up data refresh for this dashaboard?