Question 1

You are exploring a large dataset of website user behavior in Snowflake to identify patterns and potential features for a machine learning model predicting user engagement. You want to create a visualization showing the distribution of 'session_duration' for different 'user_segments'. The 'user_segmentS column contains categorical values like 'New', 'Returning', and 'Power User'. Which Snowflake SQL query and subsequent data visualization technique would be most effective for this task?
  • Question 2

    You have deployed a fraud detection model in Snowflake using Snowpark and are monitoring its performance. You observe a significant drift in the transaction data distribution compared to the data used during training. To address this, you want to implement a retraining strategy. Which of the following steps are MOST critical to automate the retraining process using Snowflake's features?
  • Question 3

    You're developing a model to predict customer churn using Snowflake. Your dataset is large and continuously growing. You need to implement partitioning strategies to optimize model training and inference performance. You consider the following partitioning strategies: 1. Partitioning by 'customer segment (e.g., 'High-Value', 'Medium-Value', 'Low-Value'). 2. Partitioning by 'signup_date' (e.g., monthly partitions). 3. Partitioning by 'region' (e.g., 'North America', 'Europe', 'Asia'). Which of the following statements accurately describe the potential benefits and drawbacks of these partitioning strategies within a Snowflake environment, specifically in the context of model training and inference?
  • Question 4

    You are responsible for deploying a fraud detection model in Snowflake. The model needs to be validated rigorously before being put into production. Which of the following actions represent the MOST comprehensive approach to model validation within the Snowflake environment, focusing on both statistical performance and operational readiness, and using Snowflake features for validation?
  • Question 5

    You are building a fraud detection model for an e-commerce platform. One of the features is 'purchase_amount', which ranges from $1 to $10,000. The data has a skewed distribution with many small purchases and a few very large ones. You need to normalize this feature for your model, which uses gradient descent. Which normalization technique(s) would be most suitable in Snowflake, considering the data characteristics and the need to handle potential future outliers?