A data engineer needs access to a table new_table, but they do not have the correct permissions. They can ask the table owner for permission, but they do not know who the table owner is. Which of the following approaches can be used to identify the owner of new_table?
Correct Answer: C
he approach that can be used to identify the owner of new_table is to review the Owner field in the table's page in Data Explorer. Data Explorer is a web-based interface that allows users to browse, create, and manage data objects such as tables, views, and functions in Databricks1. The table's page in Data Explorer provides various information about the table, such as its schema, partitions, statistics, history, and permissions2. The Owner field shows the name and email address of the user who created or owns the table3. The data engineer can use this information to contact the table owner and request for permission to access the table. The other options are not correct or reliable for identifying the owner of new_table. Reviewing the Permissions tab in the table's page in Data Explorer can show the users and groups who have access to the table, but not necessarily the owner4. Reviewing the Owner field in the table's page in the cloud storage solution can be misleading, as the owner of the data files may not be the same as the owner of the table5. There is a way to identify the owner of the table, as explained above, so option E is false. References: * 1: Data Explorer | Databricks on AWS * 2: Table details | Databricks on AWS * 3: Set owner when creating a view in databricks sql - Databricks - 9978 * 4: Table access control | Databricks on AWS * 5: External tables | Databricks on AWS
Question 82
A data engineer has configured a Structured Streaming job to read from a table, manipulate the data, and then perform a streaming write into a new table. The cade block used by the data engineer is below: If the data engineer only wants the query to execute a micro-batch to process data every 5 seconds, which of the following lines of code should the data engineer use to fill in the blank?
An engineering manager uses a Databricks SQL query to monitor ingestion latency for each data source. The manager checks the results of the query every day, but they are manually rerunning the query each day and waiting for the results. Which of the following approaches can the manager use to ensure the results of the query are updated each day?
Correct Answer: C
Databricks SQL allows users to schedule queries to run automatically at a specified frequency and time zone. This can help users to keep their dashboards or alerts updated with the latest data. To schedule a query, users need to do the following steps: In the Query Editor, click Schedule > Add schedule to open a menu with schedule settings. Choose when to run the query. Use the dropdown pickers to specify the frequency, period, starting time, and time zone. Optionally, select the Show cron syntax checkbox to edit the schedule in Quartz Cron Syntax. Choose More options to show optional settings. Users can also choose a name for the schedule, and a SQL warehouse to power the query. Click Create. The query will run automatically according to the schedule. The other options are incorrect because they do not refer to the correct location or frequency to schedule the query. The query's page in Databricks SQL is the place where users can edit, run, or schedule the query. The SQL endpoint's page in Databricks SQL is the place where users can manage the SQL warehouses and SQL endpoints. The Jobs UI is the place where users can create, run, or schedule jobs that execute notebooks, JARs, or Python scripts. Reference: Schedule a query, What are Databricks SQL alerts?, Jobs.
Question 84
Which of the following describes the storage organization of a Delta table?
Correct Answer: C
Delta Lake is the optimized storage layer that provides the foundation for storing data and tables in the Databricks lakehouse. Delta Lake is open source software that extends Parquet data files with a file-based transaction log for ACID transactions and scalable metadata handling1. Delta Lake stores its data and metadata in a collection of files in a directory on a cloud storage system, such as AWS S3 or Azure Data Lake Storage2. Each Delta table has a transaction log that records the history of operations performed on the table, such as insert, update, delete, merge, etc. The transaction log also stores the schema and partitioning information of the table2. The transaction log enables Delta Lake to provide ACID guarantees, time travel, schema enforcement, and other features1. Reference: What is Delta Lake? | Databricks on AWS Quickstart - Delta Lake Documentation
Question 85
Which of the following commands can be used to write data into a Delta table while avoiding the writing of duplicate records?
Correct Answer: E
The MERGE command can be used to upsert data from a source table, view, or DataFrame into a target Delta table. It allows you to specify conditions for matching and updating existing records, and inserting new records when no match is found. This way, you can avoid writing duplicate records into a Delta table1. The other commands (DROP, IGNORE, APPEND, INSERT) do not have this functionality and may result in duplicate records or data loss234. Reference: 1: Upsert into a Delta Lake table using merge | Databricks on AWS 2: SQL DELETE | Databricks on AWS 3: SQL INSERT INTO | Databricks on AWS 4: SQL UPDATE | Databricks on AWS