Free Access Databricks.Associate-Developer-Apache-Spark.v2022-05-05.q60 Practice Test (Page 3)

Question 6

The code block displayed below contains one or more errors. The code block should load parquet files at location filePath into a DataFrame, only loading those files that have been modified before
2029-03-20 05:44:46. Spark should enforce a schema according to the schema shown below. Find the error.
Schema:
1.root
2. |-- itemId: integer (nullable = true)
3. |-- attributes: array (nullable = true)
4. | |-- element: string (containsNull = true)
5. |-- supplier: string (nullable = true)
Code block:
1.schema = StructType([
2. StructType("itemId", IntegerType(), True),
3. StructType("attributes", ArrayType(StringType(), True), True),
4. StructType("supplier", StringType(), True)
5.])
6.
7.spark.read.options("modifiedBefore", "2029-03-20T05:44:46").schema(schema).load(filePath)

A.The attributes array is specified incorrectly, Spark cannot identify the file format, and the syntax of the call to Spark's DataFrameReader is incorrect.

B.Columns in the schema definition use the wrong object type and the syntax of the call to Spark's DataFrameReader is incorrect.

C.The data type of the schema is incompatible with the schema() operator and the modification date threshold is specified incorrectly.

D.Columns in the schema definition use the wrong object type, the modification date threshold is specified incorrectly, and Spark cannot identify the file format.

E.Columns in the schema are unable to handle empty values and the modification date threshold is specified incorrectly.

Correct Answer: D

Explanation
Correct code block:
schema = StructType([
StructField("itemId", IntegerType(), True),
StructField("attributes", ArrayType(StringType(), True), True),
StructField("supplier", StringType(), True)
])
spark.read.options(modifiedBefore="2029-03-20T05:44:46").schema(schema).parquet(filePath) This question is more difficult than what you would encounter in the exam. In the exam, for this question type, only one error needs to be identified and not "one or multiple" as in the question.
Columns in the schema definition use the wrong object type, the modification date threshold is specified incorrectly, and Spark cannot identify the file format.
Correct! Columns in the schema definition should use the StructField type. Building a schema from pyspark.sql.types, as here using classes like StructType and StructField, is one of multiple ways of expressing a schema in Spark. A StructType always contains a list of StructFields (see documentation linked below). So, nesting StructType and StructType as shown in the question is wrong.
The modification date threshold should be specified by a keyword argument like options(modifiedBefore="2029-03-20T05:44:46") and not two consecutive non-keyword arguments as in the original code block (see documentation linked below).
Spark cannot identify the file format correctly, because either it has to be specified by using the DataFrameReader.format(), as an argument to DataFrameReader.load(), or directly by calling, for example, DataFrameReader.parquet().
Columns in the schema are unable to handle empty values and the modification date threshold is specified incorrectly.
No. If StructField would be used for the columns instead of StructType (see above), the third argument specified whether the column is nullable. The original schema shows that columns should be nullable and this is specified correctly by the third argument being True in the schema in the code block.
It is correct, however, that the modification date threshold is specified incorrectly (see above).
The attributes array is specified incorrectly, Spark cannot identify the file format, and the syntax of the call to Spark's DataFrameReader is incorrect.
Wrong. The attributes array is specified correctly, following the syntax for ArrayType (see linked documentation below). That Spark cannot identify the file format is correct, see correct answer above. In addition, the DataFrameReader is called correctly through the SparkSession spark.
Columns in the schema definition use the wrong object type and the syntax of the call to Spark's DataFrameReader is incorrect.
Incorrect, the object types in the schema definition are correct and syntax of the call to Spark's DataFrameReader is correct.
The data type of the schema is incompatible with the schema() operator and the modification date threshold is specified incorrectly.
False. The data type of the schema is StructType and an accepted data type for the DataFrameReader.schema() method. It is correct however that the modification date threshold is specified incorrectly (see correct answer above).

Question 7

Which of the following statements about the differences between actions and transformations is correct?

A.Actions are evaluated lazily, while transformations are not evaluated lazily.

B.Actions generate RDDs, while transformations do not.

C.Actions do not send results to the driver, while transformations do.

D.Actions can be queued for delayed execution, while transformations can only be processed immediately.

E.Actions can trigger Adaptive Query Execution, while transformation cannot.

Question 8

In which order should the code blocks shown below be run in order to assign articlesDf a DataFrame that lists all items in column attributes ordered by the number of times these items occur, from most to least often?
Sample of DataFrame articlesDf:
1.+------+-----------------------------+-------------------+
2.|itemId|attributes |supplier |
3.+------+-----------------------------+-------------------+
4.|1 |[blue, winter, cozy] |Sports Company Inc.|
5.|2 |[red, summer, fresh, cooling]|YetiX |
6.|3 |[green, summer, travel] |Sports Company Inc.|
7.+------+-----------------------------+-------------------+

A.1. articlesDf = articlesDf.groupby("col")
2. articlesDf = articlesDf.select(explode(col("attributes")))
3. articlesDf = articlesDf.orderBy("count").select("col")
4. articlesDf = articlesDf.sort("count",ascending=False).select("col")
5. articlesDf = articlesDf.groupby("col").count()

B.4, 5

C.2, 5, 3

D.5, 2

E.2, 3, 4

F.2, 5, 4

Question 9

Which of the following code blocks creates a new DataFrame with two columns season and wind_speed_ms where column season is of data type string and column wind_speed_ms is of data type double?

A.spark.DataFrame({"season": ["winter","summer"], "wind_speed_ms": [4.5, 7.5]})

B.spark.createDataFrame([("summer", 4.5), ("winter", 7.5)], ["season", "wind_speed_ms"])

C.1. from pyspark.sql import types as T
2. spark.createDataFrame((("summer", 4.5), ("winter", 7.5)), T.StructType([T.StructField("season",

D.CharType()), T.StructField("season", T.DoubleType())]))

E.spark.newDataFrame([("summer", 4.5), ("winter", 7.5)], ["season", "wind_speed_ms"])

F.spark.createDataFrame({"season": ["winter","summer"], "wind_speed_ms": [4.5, 7.5]})

Correct Answer: B

Explanation
spark.createDataFrame([("summer", 4.5), ("winter", 7.5)], ["season", "wind_speed_ms"]) Correct. This command uses the Spark Session's createDataFrame method to create a new DataFrame. Notice how rows, columns, and column names are passed in here: The rows are specified as a Python list. Every entry in the list is a new row. Columns are specified as Python tuples (for example ("summer", 4.5)). Every column is one entry in the tuple.
The column names are specified as the second argument to createDataFrame(). The documentation (link below) shows that "when schema is a list of column names, the type of each column will be inferred from data" (the first argument). Since values 4.5 and 7.5 are both float variables, Spark will correctly infer the double type for column wind_speed_ms. Given that all values in column
"season" contain only strings, Spark will cast the column appropriately as string.
Find out more about SparkSession.createDataFrame() via the link below.
spark.newDataFrame([("summer", 4.5), ("winter", 7.5)], ["season", "wind_speed_ms"]) No, the SparkSession does not have a newDataFrame method.
from pyspark.sql import types as T
spark.createDataFrame((("summer", 4.5), ("winter", 7.5)), T.StructType([T.StructField("season",
T.CharType()), T.StructField("season", T.DoubleType())]))
No. pyspark.sql.types does not have a CharType type. See link below for available data types in Spark.
spark.createDataFrame({"season": ["winter","summer"], "wind_speed_ms": [4.5, 7.5]}) No, this is not correct Spark syntax. If you have considered this option to be correct, you may have some experience with Python's pandas package, in which this would be correct syntax. To create a Spark DataFrame from a Pandas DataFrame, you can simply use spark.createDataFrame(pandasDf) where pandasDf is the Pandas DataFrame.
Find out more about Spark syntax options using the examples in the documentation for SparkSession.createDataFrame linked below.
spark.DataFrame({"season": ["winter","summer"], "wind_speed_ms": [4.5, 7.5]}) No, the Spark Session (indicated by spark in the code above) does not have a DataFrame method.
More info: pyspark.sql.SparkSession.createDataFrame - PySpark 3.1.1 documentation and Data Types - Spark 3.1.2 Documentation Static notebook | Dynamic notebook: See test 1

Question 10

Which of the following code blocks returns a new DataFrame with the same columns as DataFrame transactionsDf, except for columns predError and value which should be removed?

A.transactionsDf.drop(["predError", "value"])

B.transactionsDf.drop("predError", "value")

C.transactionsDf.drop(col("predError"), col("value"))

D.transactionsDf.drop(predError, value)

E.transactionsDf.drop("predError & value")

Other Version: 1802Databricks.Associate-Developer-Apache-Spark.v2022-07-05.q65

Latest Upload: 103OCEG.GRCP.v2025-09-11.q211; 102HP.HPE0-V27.v2025-09-11.q78; 118Oracle.1Z0-1057-23.v2025-09-10.q47; 150Google.Professional-Cloud-Network-Engineer.v2025-09-09.q179; 131SAP.C-S4EWM-2023.v2025-09-08.q83; 164TheSecOpsGroup.CNSP.v2025-09-08.q20; 222CFAInstitute.ESG-Investing.v2025-09-08.q173; 158PECB.ISO-IEC-27001-Lead-Implementer.v2025-09-06.q132; 147Salesforce.Data-Architect.v2025-09-05.q216; 142Adobe.AD0-E605.v2025-09-05.q50

Question 6

Question 7

Question 8

Question 9

Question 10

Download PDF File