Question 126

A Delta Lake table was created with the below query:

Consider the following query:
DROP TABLE prod.sales_by_store -
If this statement is executed by a workspace admin, which result will occur?
  • Question 127

    An upstream source writes Parquet data as hourly batches to directories named with the current date. A nightly batch job runs the following code to ingest all data from the previous day as indicated by thedatevariable:

    Assume that the fieldscustomer_idandorder_idserve as a composite key to uniquely identify each order.
    If the upstream system is known to occasionally produce duplicate entries for a single order hours apart, which statement is correct?
  • Question 128

    The data engineering team maintains a table of aggregate statistics through batch nightly updates. This includes total sales for the previous day alongside totals and averages for a variety of time periods including the 7 previous days, year-to-date, and quarter-to-date. This table is named store_saies_summary and the schema is as follows:
    The table daily_store_sales contains all the information needed to update store_sales_summary. The schema for this table is:
    store_id INT, sales_date DATE, total_sales FLOAT
    If daily_store_sales is implemented as a Type 1 table and the total_sales column might be adjusted after manual data auditing, which approach is the safest to generate accurate reports in the store_sales_summary table?
  • Question 129

    A junior data engineer seeks to leverage Delta Lake's Change Data Feed functionality to create a Type 1 table representing all of the values that have ever been valid for all rows in abronzetable created with the propertydelta.enableChangeDataFeed = true. They plan to execute the following code as a daily job:

    Which statement describes the execution and results of running the above query multiple times?
  • Question 130

    A Delta Lake table representing metadata about content posts from users has the following schema:
    * user_id LONG
    * post_text STRING
    * post_id STRING
    * longitude FLOAT
    * latitude FLOAT
    * post_time TIMESTAMP
    * date DATE
    Based on the above schema, which column is a good candidate for partitioning the Delta Table?