Free Access Databricks.Associate-Developer-Apache-Spark.v2022-07-05.q65 Practice Test (Page 4)

Question 11

A.itemsDf.select(explode("attributes").alias("attributes_exploded")).filter(attributes_exploded.contains("i"))

B.itemsDf.explode(attributes).alias("attributes_exploded").filter(col("attributes_exploded").contains("i"))

C.itemsDf.select(explode("attributes")).filter("attributes_exploded".contains("i"))

D.itemsDf.select(explode("attributes").alias("attributes_exploded")).filter(col("attributes_exploded").contain

E.itemsDf.select(col("attributes").explode().alias("attributes_exploded")).filter(col("attributes_exploded").co

Question 12

The code block shown below should write DataFrame transactionsDf to disk at path csvPath as a single CSV file, using tabs (\t characters) as separators between columns, expressing missing values as string n/a, and omitting a header row with column names. Choose the answer that correctly fills the blanks in the code block to accomplish this.
transactionsDf.__1__.write.__2__(__3__, " ").__4__.__5__(csvPath)

A.1. coalesce(1)
2. option
3. "sep"
4. option("header", True)
5. path

B.1. coalesce(1)
2. option
3. "colsep"
4. option("nullValue", "n/a")
5. path

C.1. repartition(1)
2. option
3. "sep"
4. option("nullValue", "n/a")
5. csv
(Correct)

D.1. csv
2. option
3. "sep"
4. option("emptyValue", "n/a")
5. path
* 1. repartition(1)
2. mode
3. "sep"
4. mode("nullValue", "n/a")
5. csv

Question 13

Which of the following code blocks returns a single-row DataFrame that only has a column corr which shows the Pearson correlation coefficient between columns predError and value in DataFrame transactionsDf?

A.transactionsDf.select(corr(["predError", "value"]).alias("corr")).first()

B.transactionsDf.select(corr(col("predError"), col("value")).alias("corr")).first()

C.transactionsDf.select(corr(predError, value).alias("corr"))

D.transactionsDf.select(corr(col("predError"), col("value")).alias("corr")) (Correct)

E.transactionsDf.select(corr("predError", "value"))

Correct Answer: D

Explanation
In difficulty, this question is above what you can expect from the exam. What this question NO:
wants to teach you, however, is to pay attention to the useful details included in the documentation.
pyspark.sql.corr is not a very common method, but it deals with Spark's data structure in an interesting way.
The command takes two columns over multiple rows and returns a single row - similar to an aggregation function. When examining the documentation (linked below), you will find this code example:
a = range(20)
b = [2 * x for x in range(20)]
df = spark.createDataFrame(zip(a, b), ["a", "b"])
df.agg(corr("a", "b").alias('c')).collect()
[Row(c=1.0)]
See how corr just returns a single row? Once you understand this, you should be suspicious about answers that include first(), since there is no need to just select a single row. A reason to eliminate those answers is that DataFrame.first() returns an object of type Row, but not DataFrame, as requested in the question.
transactionsDf.select(corr(col("predError"), col("value")).alias("corr")) Correct! After calculating the Pearson correlation coefficient, the resulting column is correctly renamed to corr.
transactionsDf.select(corr(predError, value).alias("corr"))
No. In this answer, Python will interpret column names predError and value as variable names.
transactionsDf.select(corr(col("predError"), col("value")).alias("corr")).first() Incorrect. first() returns a row, not a DataFrame (see above and linked documentation below).
transactionsDf.select(corr("predError", "value"))
Wrong. Whie this statement returns a DataFrame in the desired shape, the column will have the name corr(predError, value) and not corr.
transactionsDf.select(corr(["predError", "value"]).alias("corr")).first() False. In addition to first() returning a row, this code block also uses the wrong call structure for command corr which takes two arguments (the two columns to correlate).
More info:
- pyspark.sql.functions.corr - PySpark 3.1.2 documentation
- pyspark.sql.DataFrame.first - PySpark 3.1.2 documentation
Static notebook | Dynamic notebook: See test 3

Question 14

Which of the following code blocks reorders the values inside the arrays in column attributes of DataFrame itemsDf from last to first one in the alphabet?
1.+------+-----------------------------+-------------------+
2.|itemId|attributes |supplier |
3.+------+-----------------------------+-------------------+
4.|1 |[blue, winter, cozy] |Sports Company Inc.|
5.|2 |[red, summer, fresh, cooling]|YetiX |
6.|3 |[green, summer, travel] |Sports Company Inc.|
7.+------+-----------------------------+-------------------+

A.itemsDf.withColumn('attributes', sort_array(col('attributes').desc()))

B.itemsDf.withColumn('attributes', sort_array(desc('attributes')))

C.itemsDf.withColumn('attributes', sort(col('attributes'), asc=False))

D.itemsDf.withColumn("attributes", sort_array("attributes", asc=False))

E.itemsDf.select(sort_array("attributes"))

Question 15

Which of the following code blocks applies the boolean-returning Python function evaluateTestSuccess to column storeId of DataFrame transactionsDf as a user-defined function?

A.1.from pyspark.sql import types as T
2.evaluateTestSuccessUDF = udf(evaluateTestSuccess, T.BooleanType())
3.transactionsDf.withColumn("result", evaluateTestSuccessUDF(col("storeId")))

B.1.evaluateTestSuccessUDF = udf(evaluateTestSuccess)
2.transactionsDf.withColumn("result", evaluateTestSuccessUDF(storeId))

C.1.from pyspark.sql import types as T
2.evaluateTestSuccessUDF = udf(evaluateTestSuccess, T.IntegerType())
3.transactionsDf.withColumn("result", evaluateTestSuccess(col("storeId")))

D.1.evaluateTestSuccessUDF = udf(evaluateTestSuccess)
2.transactionsDf.withColumn("result", evaluateTestSuccessUDF(col("storeId")))

E.1.from pyspark.sql import types as T
2.evaluateTestSuccessUDF = udf(evaluateTestSuccess, T.BooleanType())
3.transactionsDf.withColumn("result", evaluateTestSuccess(col("storeId")))

Other Version: 1510Databricks.Associate-Developer-Apache-Spark.v2022-05-05.q60

Latest Upload: 104OCEG.GRCP.v2025-09-11.q211; 103HP.HPE0-V27.v2025-09-11.q78; 118Oracle.1Z0-1057-23.v2025-09-10.q47; 150Google.Professional-Cloud-Network-Engineer.v2025-09-09.q179; 131SAP.C-S4EWM-2023.v2025-09-08.q83; 164TheSecOpsGroup.CNSP.v2025-09-08.q20; 223CFAInstitute.ESG-Investing.v2025-09-08.q173; 158PECB.ISO-IEC-27001-Lead-Implementer.v2025-09-06.q132; 147Salesforce.Data-Architect.v2025-09-05.q216; 142Adobe.AD0-E605.v2025-09-05.q50

Question 11

Question 12

Question 13

Question 14

Question 15

Download PDF File