Question 6

The code block displayed below contains one or more errors. The code block should load parquet files at location filePath into a DataFrame, only loading those files that have been modified before
2029-03-20 05:44:46. Spark should enforce a schema according to the schema shown below. Find the error.
Schema:
1.root
2. |-- itemId: integer (nullable = true)
3. |-- attributes: array (nullable = true)
4. | |-- element: string (containsNull = true)
5. |-- supplier: string (nullable = true)
Code block:
1.schema = StructType([
2. StructType("itemId", IntegerType(), True),
3. StructType("attributes", ArrayType(StringType(), True), True),
4. StructType("supplier", StringType(), True)
5.])
6.
7.spark.read.options("modifiedBefore", "2029-03-20T05:44:46").schema(schema).load(filePath)
  • Question 7

    Which of the following statements about the differences between actions and transformations is correct?
  • Question 8

    In which order should the code blocks shown below be run in order to assign articlesDf a DataFrame that lists all items in column attributes ordered by the number of times these items occur, from most to least often?
    Sample of DataFrame articlesDf:
    1.+------+-----------------------------+-------------------+
    2.|itemId|attributes |supplier |
    3.+------+-----------------------------+-------------------+
    4.|1 |[blue, winter, cozy] |Sports Company Inc.|
    5.|2 |[red, summer, fresh, cooling]|YetiX |
    6.|3 |[green, summer, travel] |Sports Company Inc.|
    7.+------+-----------------------------+-------------------+
  • Question 9

    Which of the following code blocks creates a new DataFrame with two columns season and wind_speed_ms where column season is of data type string and column wind_speed_ms is of data type double?
  • Question 10

    Which of the following code blocks returns a new DataFrame with the same columns as DataFrame transactionsDf, except for columns predError and value which should be removed?