Free Access Databricks.Associate-Developer-Apache-Spark.v2022-05-05.q60 Practice Test (Page 5)

Question 16

Which of the elements that are labeled with a circle and a number contain an error or are misrepresented?

A.1, 10

B.1, 8

C.10

D.7, 9, 10

E.1, 4, 6, 9

Question 17

Which of the following code blocks returns the number of unique values in column storeId of DataFrame transactionsDf?

A.transactionsDf.select("storeId").dropDuplicates().count()

B.transactionsDf.select(count("storeId")).dropDuplicates()

C.transactionsDf.select(distinct("storeId")).count()

D.transactionsDf.dropDuplicates().agg(count("storeId"))

E.transactionsDf.distinct().select("storeId").count()

Question 18

Which of the following code blocks performs an inner join of DataFrames transactionsDf and itemsDf on columns productId and itemId, respectively, excluding columns value and storeId from DataFrame transactionsDf and column attributes from DataFrame itemsDf?

A.transactionsDf.drop('value', 'storeId').join(itemsDf.select('attributes'), transactionsDf.productId==itemsDf.itemId)

B.1.transactionsDf.createOrReplaceTempView('transactionsDf')
2.itemsDf.createOrReplaceTempView('itemsDf')
3.
4.spark.sql("SELECT -value, -storeId FROM transactionsDf INNER JOIN itemsDf ON productId==itemId").drop("attributes")

C.transactionsDf.drop("value", "storeId").join(itemsDf.drop("attributes"),
"transactionsDf.productId==itemsDf.itemId")

D.1.transactionsDf \
2. .drop(col('value'), col('storeId')) \
3. .join(itemsDf.drop(col('attributes')), col('productId')==col('itemId'))

E.1.transactionsDf.createOrReplaceTempView('transactionsDf')
2.itemsDf.createOrReplaceTempView('itemsDf')
3.
4.statement = """
5.SELECT * FROM transactionsDf
6.INNER JOIN itemsDf
7.ON transactionsDf.productId==itemsDf.itemId
8."""
9.spark.sql(statement).drop("value", "storeId", "attributes")

Correct Answer: E

Explanation
This question offers you a wide variety of answers for a seemingly simple question. However, this variety reflects the variety of ways that one can express a join in PySpark. You need to understand some SQL syntax to get to the correct answer here.
transactionsDf.createOrReplaceTempView('transactionsDf')
itemsDf.createOrReplaceTempView('itemsDf')
statement = """
SELECT * FROM transactionsDf
INNER JOIN itemsDf
ON transactionsDf.productId==itemsDf.itemId
"""
spark.sql(statement).drop("value", "storeId", "attributes")
Correct - this answer uses SQL correctly to perform the inner join and afterwards drops the unwanted columns. This is totally fine. If you are unfamiliar with the triple-quote """ in Python: This allows you to express strings as multiple lines.
transactionsDf \
drop(col('value'), col('storeId')) \
join(itemsDf.drop(col('attributes')), col('productId')==col('itemId'))
No, this answer option is a trap, since DataFrame.drop() does not accept a list of Column objects. You could use transactionsDf.drop('value', 'storeId') instead.
transactionsDf.drop("value", "storeId").join(itemsDf.drop("attributes"),
"transactionsDf.productId==itemsDf.itemId")
Incorrect - Spark does not evaluate "transactionsDf.productId==itemsDf.itemId" as a valid join expression.
This would work if it would not be a string.
transactionsDf.drop('value', 'storeId').join(itemsDf.select('attributes'), transactionsDf.productId==itemsDf.itemId) Wrong, this statement incorrectly uses itemsDf.select instead of itemsDf.drop.
transactionsDf.createOrReplaceTempView('transactionsDf')
itemsDf.createOrReplaceTempView('itemsDf')
spark.sql("SELECT -value, -storeId FROM transactionsDf INNER JOIN itemsDf ON productId==itemId").drop("attributes") No, here the SQL expression syntax is incorrect. Simply specifying -columnName does not drop a column.
More info: pyspark.sql.DataFrame.join - PySpark 3.1.2 documentation
Static notebook | Dynamic notebook: See test 3

Question 19

Which of the following code blocks reads in parquet file /FileStore/imports.parquet as a DataFrame?

A.spark.mode("parquet").read("/FileStore/imports.parquet")

B.spark.read.path("/FileStore/imports.parquet", source="parquet")

C.spark.read().parquet("/FileStore/imports.parquet")

D.spark.read.parquet("/FileStore/imports.parquet")

E.spark.read().format('parquet').open("/FileStore/imports.parquet")

Question 20

The code block shown below should return a DataFrame with columns transactionsId, predError, value, and f from DataFrame transactionsDf. Choose the answer that correctly fills the blanks in the code block to accomplish this.
transactionsDf.__1__(__2__)

A.1. filter
2. "transactionId", "predError", "value", "f"

B.1. select
2. "transactionId, predError, value, f"

C.1. select
2. ["transactionId", "predError", "value", "f"]

D.1. where
2. col("transactionId"), col("predError"), col("value"), col("f")

E.1. select
2. col(["transactionId", "predError", "value", "f"])

Other Version: 1802Databricks.Associate-Developer-Apache-Spark.v2022-07-05.q65

Latest Upload: 103OCEG.GRCP.v2025-09-11.q211; 102HP.HPE0-V27.v2025-09-11.q78; 118Oracle.1Z0-1057-23.v2025-09-10.q47; 150Google.Professional-Cloud-Network-Engineer.v2025-09-09.q179; 131SAP.C-S4EWM-2023.v2025-09-08.q83; 164TheSecOpsGroup.CNSP.v2025-09-08.q20; 222CFAInstitute.ESG-Investing.v2025-09-08.q173; 158PECB.ISO-IEC-27001-Lead-Implementer.v2025-09-06.q132; 147Salesforce.Data-Architect.v2025-09-05.q216; 142Adobe.AD0-E605.v2025-09-05.q50

Question 16

Question 17

Question 18

Question 19

Question 20

Download PDF File