Question 86

You are tasked with presenting a business case to stakeholders demonstrating the value of a new machine learning model that predicts customer churn. The model has been trained on data within Snowflake, and you have various metrics such as accuracy, precision, recall, and F I-score. You also have feature importance scores generated using a SHAP (SHapley Additive exPlanations) explainer. Which of the following visualization strategies, when combined, would MOST effectively communicate the model's performance and impact to a non-technical audience, while also providing sufficient detail for technical stakeholders?
  • Question 87

    You are developing a model to predict equipment failure in a factory using sensor data stored in Snowflake. The data is partitioned by 'EQUIPMENT ID' and 'TIMESTAMP. After initial model training and cross-validation using the following code snippet:

    You observe significant performance variations across different equipment groups when evaluating on out-of-sample data'. Which of the following strategies could you employ to address this issue within the Snowflake environment to improve the model's generalization ability across all equipment?
  • Question 88

    A marketing analyst at 'NovaRetail' suspects that a new advertising campaign has increased the average purchase amount. They have historical purchase data in a Snowflake table called 'purchase_historf. To validate their hypothesis using the Central Limit Theorem (CLT), they perform the following steps: 1. Calculate the population mean (?) of purchase amounts from the historical data'. 2. Draw 500 random samples of size 50 from the table. 3. Calculate the sample mean (x?) for each sample. Which of the following steps are essential for correctly applying the Central Limit Theorem to perform a z-test to determine whether the new advertising campaign has significantly increased the average purchase amount?
  • Question 89

    You are developing a churn prediction model using Snowpark Python and Scikit-learn. After initial model training, you observe significant overfitting. Which of the following hyperparameter tuning strategies and code snippets, when implemented within a Snowflake Python UDF, would be MOST effective to address overfitting in a Ridge Regression model and how can you implement a reproducible model with minimal code?
  • Question 90

    You are training a regression model to predict house prices using a Snowflake dataset. The dataset contains various features, including 'number of_bedrooms', , and You want to use time-based partitioning for your training, validation, and holdout sets. However, you also need to ensure that the dataset is properly shuffled within each time partition to mitigate potential bias introduced by the order of data entry. Which of the following strategies is MOST EFFECTIVE and EFFICIENT for partitioning your data into train, validation, and holdout sets in Snowflake, while also ensuring random shuffling within each partition, and addressing potential data leakage issues?