Free Access Databricks.Associate-Developer-Apache-Spark.v2022-05-05.q60 Practice Test (Page 13)

Question 56

Which of the following code blocks reads in the JSON file stored at filePath, enforcing the schema expressed in JSON format in variable json_schema, shown in the code block below?
Code block:
1.json_schema = """
2.{"type": "struct",
3. "fields": [
4. {
5. "name": "itemId",
6. "type": "integer",
7. "nullable": true,
8. "metadata": {}
9. },
10. {
11. "name": "supplier",
12. "type": "string",
13. "nullable": true,
14. "metadata": {}
15. }
16. ]
17.}
18."""

A.spark.read.json(filePath, schema=json_schema)

B.spark.read.schema(json_schema).json(filePath)
1.schema = StructType.fromJson(json.loads(json_schema))
2.spark.read.json(filePath, schema=schema)

C.spark.read.json(filePath, schema=schema_of_json(json_schema))

D.spark.read.json(filePath, schema=spark.read.json(json_schema))

Correct Answer: B

Explanation
Spark provides a way to digest JSON-formatted strings as schema. However, it is not trivial to use. Although slightly above exam difficulty, this question is beneficial to your exam preparation, since it helps you to familiarize yourself with the concept of enforcing schemas on data you are reading in - a topic within the scope of the exam.
The first answer that jumps out here is the one that uses spark.read.schema instead of spark.read.json. Looking at the documentation of spark.read.schema (linked below), we notice that the operator expects types pyspark.sql.types.StructType or str as its first argument. While variable json_schema is a string, the documentation states that the str should be "a DDL-formatted string (For example col0 INT, col1 DOUBLE)". Variable json_schema does not contain a string in this type of format, so this answer option must be wrong.
With four potentially correct answers to go, we now look at the schema parameter of spark.read.json() (documentation linked below). Here, too, the schema parameter expects an input of type pyspark.sql.types.StructType or "a DDL-formatted string (For example col0 INT, col1 DOUBLE)". We already know that json_schema does not follow this format, so we should focus on how we can transform json_schema into pyspark.sql.types.StructType. Hereby, we also eliminate the option where schema=json_schema.
The option that includes schema=spark.read.json(json_schema) is also a wrong pick, since spark.read.json returns a DataFrame, and not a pyspark.sql.types.StructType type.
Ruling out the option which includes schema_of_json(json_schema) is rather difficult. The operator's documentation (linked below) states that it "[p]arses a JSON string and infers its schema in DDL format". This use case is slightly different from the case at hand: json_schema already is a schema definition, it does not make sense to "infer" a schema from it. In the documentation you can see an example use case which helps you understand the difference better. Here, you pass string '{a: 1}' to schema_of_json() and the method infers a DDL-format schema STRUCT<a: BIGINT> from it.
In our case, we may end up with the output schema of schema_of_json() describing the schema of the JSON schema, instead of using the schema itself. This is not the right answer option.
Now you may consider looking at the StructType.fromJson() method. It returns a variable of type StructType - exactly the type which the schema parameter of spark.read.json expects.
Although we could have looked at the correct answer option earlier, this explanation is kept as exhaustive as necessary to teach you how to systematically eliminate wrong answer options.
More info:
- pyspark.sql.DataFrameReader.schema - PySpark 3.1.2 documentation
- pyspark.sql.DataFrameReader.json - PySpark 3.1.2 documentation
- pyspark.sql.functions.schema_of_json - PySpark 3.1.2 documentation
Static notebook | Dynamic notebook: See test 3

Question 57

The code block shown below should write DataFrame transactionsDf as a parquet file to path storeDir, using brotli compression and replacing any previously existing file. Choose the answer that correctly fills the blanks in the code block to accomplish this.
transactionsDf.__1__.format("parquet").__2__(__3__).option(__4__, "brotli").__5__(storeDir)

A.1. save
2. mode
3. "ignore"
4. "compression"
5. path

B.1. store
2. with
3. "replacement"
4. "compression"
5. path

C.1. write
2. mode
3. "overwrite"
4. "compression"
5. save
(Correct)

D.1. save
2. mode
3. "replace"
4. "compression"
5. path

E.1. write
2. mode
3. "overwrite"
4. compression
5. parquet

Question 58

Which of the following code blocks sorts DataFrame transactionsDf both by column storeId in ascending and by column productId in descending order, in this priority?

A.transactionsDf.sort("storeId", asc("productId"))

B.transactionsDf.sort(col(storeId)).desc(col(productId))

C.transactionsDf.order_by(col(storeId), desc(col(productId)))

D.transactionsDf.sort("storeId", desc("productId"))

E.transactionsDf.sort("storeId").sort(desc("productId"))

Question 59

The code block displayed below contains an error. The code block should configure Spark to split data in 20 parts when exchanging data between executors for joins or aggregations. Find the error.
Code block:
spark.conf.set(spark.sql.shuffle.partitions, 20)

A.The code block uses the wrong command for setting an option.

B.The code block sets the wrong option.

C.The code block expresses the option incorrectly.

D.The code block sets the incorrect number of parts.

E.The code block is missing a parameter.

Question 60

The code block displayed below contains an error. The code block should produce a DataFrame with color as the only column and three rows with color values of red, blue, and green, respectively.
Find the error.
Code block:
1.spark.createDataFrame([("red",), ("blue",), ("green",)], "color")
Instead of calling spark.createDataFrame, just DataFrame should be called.

A.The commas in the tuples with the colors should be eliminated.

B.The colors red, blue, and green should be expressed as a simple Python list, and not a list of tuples.

C.Instead of color, a data type should be specified.

D.The "color" expression needs to be wrapped in brackets, so it reads ["color"].

Other Version: 2008Databricks.Associate-Developer-Apache-Spark.v2022-07-05.q65

Latest Upload: 131SAP.C-LCNC-2406.v2026-01-09.q21; 160Salesforce.CRT-550.v2026-01-09.q122; 129Salesforce.Marketing-Cloud-Intelligence.v2026-01-09.q41; 125CIPS.L4M1.v2026-01-09.q27; 117ISTQB.ATM.v2026-01-09.q49; 119SAP.C_BCBAI_2502.v2026-01-08.q38; 114Oracle.1Z0-1056-24.v2026-01-08.q53; 169Huawei.H13-831_V2.0.v2026-01-07.q101; 206Salesforce.Salesforce-Slack-Administrator.v2026-01-06.q103; 151CIPS.L5M15.v2026-01-06.q31

Question 56

Question 57

Question 58

Question 59

Question 60

Download PDF File