Which of the following describes the relationship between Gold tables and Silver tables?
Correct Answer: A
According to the medallion lakehouse architecture, gold tables are the final layer of data that powers analytics, machine learning, and production applications. They are often highly refined and aggregated, containing data that has been transformed into knowledge, rather than just information. Silver tables, on the other hand, are the intermediate layer of data that represents a validated, enriched version of the raw data from the bronze layer. They provide an enterprise view of all its key business entities, concepts and transactions, but they may not have all the aggregations and calculations that are required for specific use cases. Therefore, gold tables are more likely to contain aggregations than silver tables. Reference: What is the medallion lakehouse architecture? What is a Medallion Architecture?
Question 12
In which of the following file formats is data from Delta Lake tables primarily stored?
Correct Answer: C
Delta Lake is an open source project that provides ACID transactions, time travel, and other features on top of Apache Parquet, a columnar file format that is widely used for big data analytics. Delta Lake uses versioned Parquet files to store your data in your cloud storage, along with JSON files as transaction logs and checkpoint files to track the changes and ensure data integrity. Delta Lake is compatible with any Apache Hive compatible file format, such as CSV, JSON, or AVRO, but it primarily stores data as Parquet files for better performance and compression. Reference: How to Create Delta Lake tables, 5 reasons to choose Delta Lake format (on Databricks), Parquet vs Delta format in Azure Data Lake Gen 2 store, What is Delta Lake? - Azure Databricks, Lakehouse and Delta tables - Microsoft Fabric
Question 13
A data engineer and data analyst are working together on a data pipeline. The data engineer is working on the raw, bronze, and silver layers of the pipeline using Python, and the data analyst is working on the gold layer of the pipeline using SQL. The raw source of the pipeline is a streaming input. They now want to migrate their pipeline to use Delta Live Tables. Which of the following changes will need to be made to the pipeline when migrating to Delta Live Tables?
Correct Answer: A
Question 14
A data engineer is designing a data pipeline. The source system generates files in a shared directory that is also used by other processes. As a result, the files should be kept as is and will accumulate in the directory. The data engineer needs to identify which files are new since the previous run in the pipeline, and set up the pipeline to only ingest those new files with each run. Which of the following tools can the data engineer use to solve this problem?
Correct Answer: E
Auto Loader is a tool that can incrementally and efficiently process new data files as they arrive in cloud storage without any additional setup. Auto Loader provides a Structured Streaming source called cloudFiles, which automatically detects and processes new files in a given input directory path on the cloud file storage. Auto Loader also tracks the ingestion progress and ensures exactly-once semantics when writing data into Delta Lake. Auto Loader can ingest various file formats, such as JSON, CSV, XML, PARQUET, AVRO, ORC, TEXT, and BINARYFILE. Auto Loader has support for both Python and SQL in Delta Live Tables, which are a declarative way to build production-quality data pipelines with Databricks. References: What is Auto Loader?, Get started with Databricks Auto Loader, Auto Loader in Delta Live Tables
Question 15
A data engineer has been given a new record of data: id STRING = 'a1' rank INTEGER = 6 rating FLOAT = 9.4 Which of the following SQL commands can be used to append the new record to an existing Delta table my_table?
Correct Answer: A
To append a new record to an existing Delta table, you can use the INSERT INTO statement with the VALUES clause. This statement will insert one or more rows into the table with the specified values. Option A is the only code block that follows this syntax correctly. Option B is incorrect, as it uses the UNION operator, which will return a new table that is the union of two tables, not append to an existing table. Option C is incorrect, as it uses the INSERT VALUES statement, which is not a valid SQL syntax. Option D is incorrect, as it uses the UPDATE statement, which will modify existing rows in the table, not append new rows. Option E is incorrect, as it uses the UPDATE VALUES statement, which is also not a valid SQL syntax. Reference: Insert data into a table using SQL | Databricks on AWS, Insert data into a table using SQL - Azure Databricks, Delta Lake Quickstart - Azure Databricks