Free Access Databricks.Associate-Developer-Apache-Spark.v2022-07-05.q65 Practice Test (Page 8)

Question 31

The code block displayed below contains an error. The code block should configure Spark so that DataFrames up to a size of 20 MB will be broadcast to all worker nodes when performing a join.
Find the error.
Code block:

A.spark.conf.set("spark.sql.autoBroadcastJoinThreshold", 20)

B.Spark will only broadcast DataFrames that are much smaller than the default value.

C.The correct option to write configurations is through spark.config and not spark.conf.

D.Spark will only apply the limit to threshold joins and not to other joins.

E.The passed limit has the wrong variable type.

F.The command is evaluated lazily and needs to be followed by an action.

Question 32

In which order should the code blocks shown below be run in order to assign articlesDf a DataFrame that lists all items in column attributes ordered by the number of times these items occur, from most to least often?
Sample of DataFrame articlesDf:
1.+------+-----------------------------+-------------------+
2.|itemId|attributes |supplier |
3.+------+-----------------------------+-------------------+
4.|1 |[blue, winter, cozy] |Sports Company Inc.|
5.|2 |[red, summer, fresh, cooling]|YetiX |
6.|3 |[green, summer, travel] |Sports Company Inc.|
7.+------+-----------------------------+-------------------+

A.1. articlesDf = articlesDf.groupby("col")
2. articlesDf = articlesDf.select(explode(col("attributes")))
3. articlesDf = articlesDf.orderBy("count").select("col")
4. articlesDf = articlesDf.sort("count",ascending=False).select("col")
5. articlesDf = articlesDf.groupby("col").count()

B.4, 5

C.2, 5, 3

D.5, 2

E.2, 3, 4

F.2, 5, 4

Question 33

The code block shown below should return a DataFrame with columns transactionsId, predError, value, and f from DataFrame transactionsDf. Choose the answer that correctly fills the blanks in the code block to accomplish this.
transactionsDf.__1__(__2__)

A.1. filter
2. "transactionId", "predError", "value", "f"

B.1. select
2. "transactionId, predError, value, f"

C.1. select
2. ["transactionId", "predError", "value", "f"]

D.1. where
2. col("transactionId"), col("predError"), col("value"), col("f")

E.1. select
2. col(["transactionId", "predError", "value", "f"])

Question 34

Which of the following describes characteristics of the Dataset API?

A.The Dataset API does not support unstructured data.

B.In Python, the Dataset API mainly resembles Pandas' DataFrame API.

C.In Python, the Dataset API's schema is constructed via type hints.

D.The Dataset API is available in Scala, but it is not available in Python.

E.The Dataset API does not provide compile-time type safety.

Question 35

The code block displayed below contains at least one error. The code block should return a DataFrame with only one column, result. That column should include all values in column value from DataFrame transactionsDf raised to the power of 5, and a null value for rows in which there is no value in column value. Find the error(s).
Code block:
1.from pyspark.sql.functions import udf
2.from pyspark.sql import types as T
3.
4.transactionsDf.createOrReplaceTempView('transactions')
5.
6.def pow_5(x):
7. return x**5
8.
9.spark.udf.register(pow_5, 'power_5_udf', T.LongType())
10.spark.sql('SELECT power_5_udf(value) FROM transactions')

A.The pow_5 method is unable to handle empty values in column value and the name of the column in the returned DataFrame is not result.

B.The returned DataFrame includes multiple columns instead of just one column.

C.The pow_5 method is unable to handle empty values in column value, the name of the column in the returned DataFrame is not result, and the SparkSession cannot access the transactionsDf DataFrame.

D.The pow_5 method is unable to handle empty values in column value, the name of the column in the returned DataFrame is not result, and Spark driver does not call the UDF function appropriately.

E.The pow_5 method is unable to handle empty values in column value, the UDF function is not registered properly with the Spark driver, and the name of the column in the returned DataFrame is not result.

Other Version: 1510Databricks.Associate-Developer-Apache-Spark.v2022-05-05.q60

Latest Upload: 104OCEG.GRCP.v2025-09-11.q211; 103HP.HPE0-V27.v2025-09-11.q78; 118Oracle.1Z0-1057-23.v2025-09-10.q47; 150Google.Professional-Cloud-Network-Engineer.v2025-09-09.q179; 131SAP.C-S4EWM-2023.v2025-09-08.q83; 164TheSecOpsGroup.CNSP.v2025-09-08.q20; 223CFAInstitute.ESG-Investing.v2025-09-08.q173; 158PECB.ISO-IEC-27001-Lead-Implementer.v2025-09-06.q132; 147Salesforce.Data-Architect.v2025-09-05.q216; 142Adobe.AD0-E605.v2025-09-05.q50

Question 31

Question 32

Question 33

Question 34

Question 35

Download PDF File