Free Access Databricks.Associate-Developer-Apache-Spark.v2022-05-05.q60 Practice Test (Page 2)

Question 1

Which of the following code blocks adds a column predErrorSqrt to DataFrame transactionsDf that is the square root of column predError?

A.transactionsDf.withColumn("predErrorSqrt", sqrt(predError))

B.transactionsDf.select(sqrt(predError))

C.transactionsDf.withColumn("predErrorSqrt", col("predError").sqrt())

D.transactionsDf.withColumn("predErrorSqrt", sqrt(col("predError")))

E.transactionsDf.select(sqrt("predError"))

Correct Answer: D

Explanation
transactionsDf.withColumn("predErrorSqrt", sqrt(col("predError")))
Correct. The DataFrame.withColumn() operator is used to add a new column to a DataFrame. It takes two arguments: The name of the new column (here: predErrorSqrt) and a Column expression as the new column. In PySpark, a Column expression means referring to a column using the col("predError") command or by other means, for example by transactionsDf.predError, or even just using the column name as a string, "predError".
The question asks for the square root. sqrt() is a function in pyspark.sql.functions and calculates the square root. It takes a value or a Column as an input. Here it is the predError column of DataFrame transactionsDf expressed through col("predError").
transactionsDf.withColumn("predErrorSqrt", sqrt(predError))
Incorrect. In this expression, sqrt(predError) is incorrect syntax. You cannot refer to predError in this way - to Spark it looks as if you are trying to refer to the non-existent Python variable predError.
You could pass transactionsDf.predError, col("predError") (as in the correct solution), or even just "predError" instead.
transactionsDf.select(sqrt(predError))
Wrong. Here, the explanation just above this one about how to refer to predError applies.
transactionsDf.select(sqrt("predError"))
No. While this is correct syntax, it will return a single-column DataFrame only containing a column showing the square root of column predError. However, the question asks for a column to be added to the original DataFrame transactionsDf.
transactionsDf.withColumn("predErrorSqrt", col("predError").sqrt())
No. The issue with this statement is that column col("predError") has no sqrt() method. sqrt() is a member of pyspark.sql.functions, but not of pyspark.sql.Column.
More info: pyspark.sql.DataFrame.withColumn - PySpark 3.1.2 documentation and pyspark.sql.functions.sqrt - PySpark 3.1.2 documentation Static notebook | Dynamic notebook: See test 2

Question 2

Which of the following code blocks concatenates rows of DataFrames transactionsDf and transactionsNewDf, omitting any duplicates?

A.transactionsDf.concat(transactionsNewDf).unique()

B.transactionsDf.union(transactionsNewDf).distinct()

C.spark.union(transactionsDf, transactionsNewDf).distinct()

D.transactionsDf.join(transactionsNewDf, how="union").distinct()

E.transactionsDf.union(transactionsNewDf).unique()

Question 3

Which of the following code blocks reads in the JSON file stored at filePath as a DataFrame?

A.spark.read.json(filePath)

B.spark.read.path(filePath, source="json")

C.spark.read().path(filePath)

D.spark.read().json(filePath)

E.spark.read.path(filePath)

Question 4

The code block shown below should return a two-column DataFrame with columns transactionId and supplier, with combined information from DataFrames itemsDf and transactionsDf. The code block should merge rows in which column productId of DataFrame transactionsDf matches the value of column itemId in DataFrame itemsDf, but only where column storeId of DataFrame transactionsDf does not match column itemId of DataFrame itemsDf. Choose the answer that correctly fills the blanks in the code block to accomplish this.
Code block:
transactionsDf.__1__(itemsDf, __2__).__3__(__4__)

A.1. join
2. transactionsDf.productId==itemsDf.itemId, how="inner"
3. select
4. "transactionId", "supplier"

B.1. select
2. "transactionId", "supplier"
3. join
4. [transactionsDf.storeId!=itemsDf.itemId, transactionsDf.productId==itemsDf.itemId]

C.1. join
2. [transactionsDf.productId==itemsDf.itemId, transactionsDf.storeId!=itemsDf.itemId]
3. select
4. "transactionId", "supplier"

D.1. filter
2. "transactionId", "supplier"
3. join
4. "transactionsDf.storeId!=itemsDf.itemId, transactionsDf.productId==itemsDf.itemId"

E.1. join
2. transactionsDf.productId==itemsDf.itemId, transactionsDf.storeId!=itemsDf.itemId
3. filter
4. "transactionId", "supplier"

Question 5

The code block shown below should return a column that indicates through boolean variables whether rows in DataFrame transactionsDf have values greater or equal to 20 and smaller or equal to
30 in column storeId and have the value 2 in column productId. Choose the answer that correctly fills the blanks in the code block to accomplish this.
transactionsDf.__1__((__2__.__3__) __4__ (__5__))

A.1. select
2. col("storeId")
3. between(20, 30)
4. and
5. col("productId")==2

B.1. where
2. col("storeId")
3. geq(20).leq(30)
4. &
5. col("productId")==2

C.1. select
2. "storeId"
3. between(20, 30)
4. &&
5. col("productId")==2

D.1. select
2. col("storeId")
3. between(20, 30)
4. &&
5. col("productId")=2

E.1. select
2. col("storeId")
3. between(20, 30)
4. &
5. col("productId")==2

Other Version: 1801Databricks.Associate-Developer-Apache-Spark.v2022-07-05.q65

Latest Upload: 117Oracle.1Z0-1057-23.v2025-09-10.q47; 150Google.Professional-Cloud-Network-Engineer.v2025-09-09.q179; 131SAP.C-S4EWM-2023.v2025-09-08.q83; 163TheSecOpsGroup.CNSP.v2025-09-08.q20; 222CFAInstitute.ESG-Investing.v2025-09-08.q173; 155PECB.ISO-IEC-27001-Lead-Implementer.v2025-09-06.q132; 146Salesforce.Data-Architect.v2025-09-05.q216; 139Adobe.AD0-E605.v2025-09-05.q50; 184Nutanix.NCP-MCI-6.10.v2025-09-05.q55; 114Oracle.1z0-591.v2025-09-05.q104

Question 1

Question 2

Question 3

Question 4

Question 5

Download PDF File