Question 96

You are currently working on a production job failure with a job set up in job clusters due to a data issue, what cluster do you need to start to investigate and analyze the data?
  • Question 97

    A production cluster has 3 executor nodes and uses the same virtual machine type for the driver and executor.
    When evaluating the Ganglia Metrics for this cluster, which indicator would signal a bottleneck caused by code executing on the driver?
  • Question 98

    A data ingestion task requires a one-TB JSON dataset to be written out to Parquet with a target part-file size of
    512 MB. Because Parquet is being used instead of Delta Lake, built-in file-sizing features such as Auto-Optimize & Auto-Compaction cannot be used.
    Which strategy will yield the best performance without shuffling data?
  • Question 99

    All records from an Apache Kafka producer are being ingested into a single Delta Lake table with the following schema:
    key BINARY, value BINARY, topic STRING, partition LONG, offset LONG, timestamp LONG There are 5 unique topics being ingested. Only the "registration" topic contains Personal Identifiable Information (PII). The company wishes to restrict access to PII. The company also wishes to only retain records containing PII in this table for 14 days after initial ingestion. However, for non-PII information, it would like to retain these records indefinitely.
    Which of the following solutions meets the requirements?
  • Question 100

    The data architect has mandated that all tables in the Lakehouse should be configured as external Delta Lake tables.
    Which approach will ensure that this requirement is met?