Free Access Databricks.Associate-Developer-Apache-Spark.v2022-07-05.q65 Practice Test (Page 12)

Question 51

Which of the following describes the role of the cluster manager?

A.The cluster manager schedules tasks on the cluster in client mode.

B.The cluster manager schedules tasks on the cluster in local mode.

C.The cluster manager allocates resources to Spark applications and maintains the executor processes in client mode.

D.The cluster manager allocates resources to Spark applications and maintains the executor processes in remote mode.

E.The cluster manager allocates resources to the DataFrame manager.

Question 52

Which of the following code blocks creates a new DataFrame with two columns season and wind_speed_ms where column season is of data type string and column wind_speed_ms is of data type double?

A.spark.DataFrame({"season": ["winter","summer"], "wind_speed_ms": [4.5, 7.5]})

B.spark.createDataFrame([("summer", 4.5), ("winter", 7.5)], ["season", "wind_speed_ms"])

C.1. from pyspark.sql import types as T
2. spark.createDataFrame((("summer", 4.5), ("winter", 7.5)), T.StructType([T.StructField("season",

D.CharType()), T.StructField("season", T.DoubleType())]))

E.spark.newDataFrame([("summer", 4.5), ("winter", 7.5)], ["season", "wind_speed_ms"])

F.spark.createDataFrame({"season": ["winter","summer"], "wind_speed_ms": [4.5, 7.5]})

Correct Answer: B

Explanation
spark.createDataFrame([("summer", 4.5), ("winter", 7.5)], ["season", "wind_speed_ms"]) Correct. This command uses the Spark Session's createDataFrame method to create a new DataFrame. Notice how rows, columns, and column names are passed in here: The rows are specified as a Python list. Every entry in the list is a new row. Columns are specified as Python tuples (for example ("summer", 4.5)). Every column is one entry in the tuple.
The column names are specified as the second argument to createDataFrame(). The documentation (link below) shows that "when schema is a list of column names, the type of each column will be inferred from data" (the first argument). Since values 4.5 and 7.5 are both float variables, Spark will correctly infer the double type for column wind_speed_ms. Given that all values in column
"season" contain only strings, Spark will cast the column appropriately as string.
Find out more about SparkSession.createDataFrame() via the link below.
spark.newDataFrame([("summer", 4.5), ("winter", 7.5)], ["season", "wind_speed_ms"]) No, the SparkSession does not have a newDataFrame method.
from pyspark.sql import types as T
spark.createDataFrame((("summer", 4.5), ("winter", 7.5)), T.StructType([T.StructField("season",
T.CharType()), T.StructField("season", T.DoubleType())]))
No. pyspark.sql.types does not have a CharType type. See link below for available data types in Spark.
spark.createDataFrame({"season": ["winter","summer"], "wind_speed_ms": [4.5, 7.5]}) No, this is not correct Spark syntax. If you have considered this option to be correct, you may have some experience with Python's pandas package, in which this would be correct syntax. To create a Spark DataFrame from a Pandas DataFrame, you can simply use spark.createDataFrame(pandasDf) where pandasDf is the Pandas DataFrame.
Find out more about Spark syntax options using the examples in the documentation for SparkSession.createDataFrame linked below.
spark.DataFrame({"season": ["winter","summer"], "wind_speed_ms": [4.5, 7.5]}) No, the Spark Session (indicated by spark in the code above) does not have a DataFrame method.
More info: pyspark.sql.SparkSession.createDataFrame - PySpark 3.1.1 documentation and Data Types - Spark 3.1.2 Documentation Static notebook | Dynamic notebook: See test 1

Question 53

Which of the following code blocks returns about 150 randomly selected rows from the 1000-row DataFrame transactionsDf, assuming that any row can appear more than once in the returned DataFrame?

A.transactionsDf.resample(0.15, False, 3142)

B.transactionsDf.sample(0.15, False, 3142)

C.transactionsDf.sample(0.15)

D.transactionsDf.sample(0.85, 8429)

E.transactionsDf.sample(True, 0.15, 8261)

Question 54

Which of the following statements about broadcast variables is correct?

A.Broadcast variables are serialized with every single task.

B.Broadcast variables are commonly used for tables that do not fit into memory.

C.Broadcast variables are immutable.

D.Broadcast variables are occasionally dynamically updated on a per-task basis.

E.Broadcast variables are local to the worker node and not shared across the cluster.

Question 55

The code block shown below should return a new 2-column DataFrame that shows one attribute from column attributes per row next to the associated itemName, for all suppliers in column supplier whose name includes Sports. Choose the answer that correctly fills the blanks in the code block to accomplish this.
Sample of DataFrame itemsDf:
1.+------+----------------------------------+-----------------------------+-------------------+
2.|itemId|itemName |attributes |supplier |
3.+------+----------------------------------+-----------------------------+-------------------+
4.|1 |Thick Coat for Walking in the Snow|[blue, winter, cozy] |Sports Company Inc.|
5.|2 |Elegant Outdoors Summer Dress |[red, summer, fresh, cooling]|YetiX |
6.|3 |Outdoors Backpack |[green, summer, travel] |Sports Company Inc.|
7.+------+----------------------------------+-----------------------------+-------------------+ Code block:
itemsDf.__1__(__2__).select(__3__, __4__)

A.1. filter
2. col("supplier").isin("Sports")
3. "itemName"
4. explode(col("attributes"))

B.1. where
2. col("supplier").contains("Sports")
3. "itemName"
4. "attributes"

C.1. where
2. col(supplier).contains("Sports")
3. explode(attributes)
4. itemName

D.1. where
2. "Sports".isin(col("Supplier"))
3. "itemName"
4. array_explode("attributes")

E.1. filter
2. col("supplier").contains("Sports")
3. "itemName"
4. explode("attributes")

Correct Answer: E

Explanation
Output of correct code block:
+----------------------------------+------+
|itemName |col |
+----------------------------------+------+
|Thick Coat for Walking in the Snow|blue |
|Thick Coat for Walking in the Snow|winter|
|Thick Coat for Walking in the Snow|cozy |
|Outdoors Backpack |green |
|Outdoors Backpack |summer|
|Outdoors Backpack |travel|
+----------------------------------+------+
The key to solving this question is knowing about Spark's explode operator. Using this operator, you can extract values from arrays into single rows. The following guidance steps through the answers systematically from the first to the last gap. Note that there are many ways to solving the gap questions and filtering out wrong answers, you do not always have to start filtering out from the first gap, but can also exclude some answers based on obvious problems you see with them.
The answers to the first gap present you with two options: filter and where. These two are actually synonyms in PySpark, so using either of those is fine. The answer options to this gap therefore do not help us in selecting the right answer.
The second gap is more interesting. One answer option includes "Sports".isin(col("Supplier")). This construct does not work, since Python's string does not have an isin method. Another option contains col(supplier). Here, Python will try to interpret supplier as a variable. We have not set this variable, so this is not a viable answer. Then, you are left with answers options that include col ("supplier").contains("Sports") and col("supplier").isin("Sports"). The question states that we are looking for suppliers whose name includes Sports, so we have to go for the contains operator here.
We would use the isin operator if we wanted to filter out for supplier names that match any entries in a list of supplier names.
Finally, we are left with two answers that fill the third gap both with "itemName" and the fourth gap either with explode("attributes") or "attributes". While both are correct Spark syntax, only explode ("attributes") will help us achieve our goal. Specifically, the question asks for one attribute from column attributes per row - this is what the explode() operator does.
One answer option also includes array_explode() which is not a valid operator in PySpark.
More info: pyspark.sql.functions.explode - PySpark 3.1.2 documentation
Static notebook | Dynamic notebook: See test 3

Other Version: 1734Databricks.Associate-Developer-Apache-Spark.v2022-05-05.q60

Latest Upload: 103Microsoft.GH-100.v2026-01-12.q22; 104GInI.CInP.v2026-01-12.q84; 104Microsoft.MS-900.v2026-01-12.q286; 103UiPath.UiPath-TAEPv1.v2026-01-12.q39; 134SAP.C-LCNC-2406.v2026-01-09.q21; 177Salesforce.CRT-550.v2026-01-09.q122; 130Salesforce.Marketing-Cloud-Intelligence.v2026-01-09.q41; 125CIPS.L4M1.v2026-01-09.q27; 120ISTQB.ATM.v2026-01-09.q49; 120SAP.C_BCBAI_2502.v2026-01-08.q38

Question 51

Question 52

Question 53

Question 54

Question 55

Download PDF File