Question 31

You have a structured dataset in Snowflake containing customer information and purchase history. You aim to build a multi-class classification model to predict customer churn, categorizing customers into 'Low Risk', 'Medium Risk', and 'High Risk' of churning. After training the model, you want to evaluate its performance. Which of the following metrics and evaluation techniques, when used together, provide the MOST comprehensive understanding of the model's performance across all churn risk categories, especially when dealing with potential class imbalance?
  • Question 32

    You are a data scientist working with a Snowflake table named 'CUSTOMER TRANSACTIONS' that contains sensitive PII data, including customer names and email addresses. You need to create a representative sample of 1% of the data for model development, ensuring that the sample is anonymized and protects customer privacy. The sample must be reproducible for future model iterations.
    Which of the following steps are most appropriate using Snowpark for Python and SQL?
  • Question 33

    You are tasked with feature engineering a dataset containing customer transaction data stored in a Snowflake table named 'CUSTOMER TRANSACTIONS'. This table includes columns like 'CUSTOMER ID', 'TRANSACTION DATE, and 'TRANSACTION AMOUNT. You need to create a new feature representing the 'Recency' of the customer, which is the number of days since their last transaction. Using Snowpark Pandas, which of the following code snippets will correctly calculate the Recency feature as a new column in a Snowpark DataFrame?
  • Question 34

    You are using Snowflake Cortex to build a customer support chatbot that leverages LLMs to answer customer questions. You have a knowledge base stored in a Snowflake table. The following options describe different methods for using this knowledge base in conjunction with the LLM to generate responses. Which of the following approaches will likely result in the MOST accurate, relevant, and cost-effective responses from the LLM?
  • Question 35

    You're building a regression model using Snowpark Python to predict house prices. After initial training, you observe that the model consistently overestimates the prices of high-value houses and underestimates the prices of low-value houses. Given the options below, which optimization metric, along with code snippet to calculate it using Snowpark, would be most effective in addressing this specific issue?