Free Access Databricks.Associate-Developer-Apache-Spark.v2022-07-05.q65 Practice Test (Page 10)

Question 41

The code block displayed below contains an error. The code block should merge the rows of DataFrames transactionsDfMonday and transactionsDfTuesday into a new DataFrame, matching column names and inserting null values where column names do not appear in both DataFrames. Find the error.
Sample of DataFrame transactionsDfMonday:
1.+-------------+---------+-----+-------+---------+----+
2.|transactionId|predError|value|storeId|productId| f|
3.+-------------+---------+-----+-------+---------+----+
4.| 5| null| null| null| 2|null|
5.| 6| 3| 2| 25| 2|null|
6.+-------------+---------+-----+-------+---------+----+
Sample of DataFrame transactionsDfTuesday:
1.+-------+-------------+---------+-----+
2.|storeId|transactionId|productId|value|
3.+-------+-------------+---------+-----+
4.| 25| 1| 1| 4|
5.| 2| 2| 2| 7|
6.| 3| 4| 2| null|
7.| null| 5| 2| null|
8.+-------+-------------+---------+-----+
Code block:
sc.union([transactionsDfMonday, transactionsDfTuesday])

A.The DataFrames' RDDs need to be passed into the sc.union method instead of the DataFrame variable names.

B.Instead of union, the concat method should be used, making sure to not use its default arguments.

C.Instead of the Spark context, transactionDfMonday should be called with the join method instead of the union method, making sure to use its default arguments.

D.Instead of the Spark context, transactionDfMonday should be called with the union method.

E.Instead of the Spark context, transactionDfMonday should be called with the unionByName method instead of the union method, making sure to not use its default arguments.

Correct Answer: E

Explanation
Correct code block:
transactionsDfMonday.unionByName(transactionsDfTuesday, True)
Output of correct code block:
+-------------+---------+-----+-------+---------+----+
|transactionId|predError|value|storeId|productId| f|
+-------------+---------+-----+-------+---------+----+
| 5| null| null| null| 2|null|
| 6| 3| 2| 25| 2|null|
| 1| null| 4| 25| 1|null|
| 2| null| 7| 2| 2|null|
| 4| null| null| 3| 2|null|
| 5| null| null| null| 2|null|
+-------------+---------+-----+-------+---------+----+
For solving this question, you should be aware of the difference between the DataFrame.union() and DataFrame.unionByName() methods. The first one matches columns independent of their names, just by their order. The second one matches columns by their name (which is asked for in the question). It also has a useful optional argument, allowMissingColumns. This allows you to merge DataFrames that have different columns - just like in this example.
sc stands for SparkContext and is automatically provided when executing code on Databricks. While sc.union() allows you to join RDDs, it is not the right choice for joining DataFrames. A hint away from sc.union() is given where the question talks about joining "into a new DataFrame".
concat is a method in pyspark.sql.functions. It is great for consolidating values from different columns, but has no place when trying to join rows of multiple DataFrames.
Finally, the join method is a contender here. However, the default join defined for that method is an inner join which does not get us closer to the goal to match the two DataFrames as instructed, especially given that with the default arguments we cannot define a join condition.
More info:
- pyspark.sql.DataFrame.unionByName - PySpark 3.1.2 documentation
- pyspark.SparkContext.union - PySpark 3.1.2 documentation
- pyspark.sql.functions.concat - PySpark 3.1.2 documentation
Static notebook | Dynamic notebook: See test 3

Question 42

Which of the following code blocks generally causes a great amount of network traffic?

A.DataFrame.select()

B.DataFrame.coalesce()

C.DataFrame.collect()

D.DataFrame.rdd.map()

E.DataFrame.count()

Question 43

The code block shown below should return a one-column DataFrame where the column storeId is converted to string type. Choose the answer that correctly fills the blanks in the code block to accomplish this.
transactionsDf.__1__(__2__.__3__(__4__))

A.1. select
2. col("storeId")
3. cast
4. StringType

B.1. select
2. col("storeId")
3. as
4. StringType

C.1. cast
2. "storeId"
3. as
4. StringType()

D.1. select
2. col("storeId")
3. cast
4. StringType()

E.1. select
2. storeId
3. cast
4. StringType()

Question 44

Which of the following code blocks performs an inner join of DataFrames transactionsDf and itemsDf on columns productId and itemId, respectively, excluding columns value and storeId from DataFrame transactionsDf and column attributes from DataFrame itemsDf?

A.transactionsDf.drop('value', 'storeId').join(itemsDf.select('attributes'), transactionsDf.productId==itemsDf.itemId)

B.1.transactionsDf.createOrReplaceTempView('transactionsDf')
2.itemsDf.createOrReplaceTempView('itemsDf')
3.
4.spark.sql("SELECT -value, -storeId FROM transactionsDf INNER JOIN itemsDf ON productId==itemId").drop("attributes")

C.transactionsDf.drop("value", "storeId").join(itemsDf.drop("attributes"),
"transactionsDf.productId==itemsDf.itemId")

D.1.transactionsDf \
2. .drop(col('value'), col('storeId')) \
3. .join(itemsDf.drop(col('attributes')), col('productId')==col('itemId'))

E.1.transactionsDf.createOrReplaceTempView('transactionsDf')
2.itemsDf.createOrReplaceTempView('itemsDf')
3.
4.statement = """
5.SELECT * FROM transactionsDf
6.INNER JOIN itemsDf
7.ON transactionsDf.productId==itemsDf.itemId
8."""
9.spark.sql(statement).drop("value", "storeId", "attributes")

Correct Answer: E

Explanation
This question offers you a wide variety of answers for a seemingly simple question. However, this variety reflects the variety of ways that one can express a join in PySpark. You need to understand some SQL syntax to get to the correct answer here.
transactionsDf.createOrReplaceTempView('transactionsDf')
itemsDf.createOrReplaceTempView('itemsDf')
statement = """
SELECT * FROM transactionsDf
INNER JOIN itemsDf
ON transactionsDf.productId==itemsDf.itemId
"""
spark.sql(statement).drop("value", "storeId", "attributes")
Correct - this answer uses SQL correctly to perform the inner join and afterwards drops the unwanted columns. This is totally fine. If you are unfamiliar with the triple-quote """ in Python: This allows you to express strings as multiple lines.
transactionsDf \
drop(col('value'), col('storeId')) \
join(itemsDf.drop(col('attributes')), col('productId')==col('itemId'))
No, this answer option is a trap, since DataFrame.drop() does not accept a list of Column objects. You could use transactionsDf.drop('value', 'storeId') instead.
transactionsDf.drop("value", "storeId").join(itemsDf.drop("attributes"),
"transactionsDf.productId==itemsDf.itemId")
Incorrect - Spark does not evaluate "transactionsDf.productId==itemsDf.itemId" as a valid join expression.
This would work if it would not be a string.
transactionsDf.drop('value', 'storeId').join(itemsDf.select('attributes'), transactionsDf.productId==itemsDf.itemId) Wrong, this statement incorrectly uses itemsDf.select instead of itemsDf.drop.
transactionsDf.createOrReplaceTempView('transactionsDf')
itemsDf.createOrReplaceTempView('itemsDf')
spark.sql("SELECT -value, -storeId FROM transactionsDf INNER JOIN itemsDf ON productId==itemId").drop("attributes") No, here the SQL expression syntax is incorrect. Simply specifying -columnName does not drop a column.
More info: pyspark.sql.DataFrame.join - PySpark 3.1.2 documentation
Static notebook | Dynamic notebook: See test 3

Question 45

Which of the following describes tasks?

A.A task is a command sent from the driver to the executors in response to a transformation.

B.Tasks transform jobs into DAGs.

C.A task is a collection of slots.

D.A task is a collection of rows.

E.Tasks get assigned to the executors by the driver.

Other Version: 1510Databricks.Associate-Developer-Apache-Spark.v2022-05-05.q60

Latest Upload: 104OCEG.GRCP.v2025-09-11.q211; 103HP.HPE0-V27.v2025-09-11.q78; 118Oracle.1Z0-1057-23.v2025-09-10.q47; 150Google.Professional-Cloud-Network-Engineer.v2025-09-09.q179; 131SAP.C-S4EWM-2023.v2025-09-08.q83; 164TheSecOpsGroup.CNSP.v2025-09-08.q20; 223CFAInstitute.ESG-Investing.v2025-09-08.q173; 158PECB.ISO-IEC-27001-Lead-Implementer.v2025-09-06.q132; 147Salesforce.Data-Architect.v2025-09-05.q216; 142Adobe.AD0-E605.v2025-09-05.q50

Question 41

Question 42

Question 43

Question 44

Question 45

Download PDF File