Question 31

The code block displayed below contains an error. The code block should configure Spark so that DataFrames up to a size of 20 MB will be broadcast to all worker nodes when performing a join.
Find the error.
Code block:
  • Question 32

    In which order should the code blocks shown below be run in order to assign articlesDf a DataFrame that lists all items in column attributes ordered by the number of times these items occur, from most to least often?
    Sample of DataFrame articlesDf:
    1.+------+-----------------------------+-------------------+
    2.|itemId|attributes |supplier |
    3.+------+-----------------------------+-------------------+
    4.|1 |[blue, winter, cozy] |Sports Company Inc.|
    5.|2 |[red, summer, fresh, cooling]|YetiX |
    6.|3 |[green, summer, travel] |Sports Company Inc.|
    7.+------+-----------------------------+-------------------+
  • Question 33

    The code block shown below should return a DataFrame with columns transactionsId, predError, value, and f from DataFrame transactionsDf. Choose the answer that correctly fills the blanks in the code block to accomplish this.
    transactionsDf.__1__(__2__)
  • Question 34

    Which of the following describes characteristics of the Dataset API?
  • Question 35

    The code block displayed below contains at least one error. The code block should return a DataFrame with only one column, result. That column should include all values in column value from DataFrame transactionsDf raised to the power of 5, and a null value for rows in which there is no value in column value. Find the error(s).
    Code block:
    1.from pyspark.sql.functions import udf
    2.from pyspark.sql import types as T
    3.
    4.transactionsDf.createOrReplaceTempView('transactions')
    5.
    6.def pow_5(x):
    7. return x**5
    8.
    9.spark.udf.register(pow_5, 'power_5_udf', T.LongType())
    10.spark.sql('SELECT power_5_udf(value) FROM transactions')