Question 61

You are tasked with fine-tuning a Snowflake Cortex LLM model using your own labeled dataset to improve its performance on a specific sentiment analysis task related to customer reviews. You have already created a Snowflake stage 'my_stage' and uploaded your labeled data in CSV format to this stage. The labeled data contains two columns: 'review_text' and 'sentiment' (values: 'positive', 'negative', 'neutral'). Which of the following SQL commands, or sequences of commands, is MOST appropriate to initiate the fine-tuning process using the 'SNOWFLAKE.ML.FINETUNE LLM' function? Assume you have already set the necessary permissions for your role to access the model and stage.
  • Question 62

    A data scientist is building a linear regression model in Snowflake to predict customer churn based on structured data stored in a table named 'CUSTOMER DATA'. The table includes features like 'CUSTOMER D', 'AGE, 'TENURE MONTHS', 'NUM PRODUCTS', and 'AVG MONTHLY SPEND'. The target variable is 'CHURNED' (1 for churned, 0 for active). After building the model, the data scientist wants to evaluate its performance using Mean Squared Error (MSE) on a held-out test set. Which of the following SQL queries, executed within Snowflake's stored procedure framework, is the MOST efficient and accurate way to calculate the MSE for the linear regression model predictions against the actual 'CHURNED values in the 'CUSTOMER DATA TEST table, assuming the linear regression model is named 'churn _ model' and the predicted values are generated by the MODEL APPLY() function?
  • Question 63

    A data scientist is tasked with predicting house prices using Snowflake. They have a dataset stored in a Snowflake table called 'HOUSE PRICES' with columns such as 'SQUARE FOOTAGE, 'NUM BEDROOMS, 'LOCATION_ID, and 'PRICE. They choose a Random Forest Regressor model. Which of the following steps is MOST important to prevent overfitting and ensure good generalization performance on unseen data, and how can this be effectively implemented within a Snowflake-centric workflow?
  • Question 64

    You've deployed a fraud detection model in Snowflake using Snowpark. You are monitoring its performance and notice a significant decrease in recall, while precision remains high. This means the model is missing many fraudulent transactions. The training data was initially balanced, but you suspect that recent changes in user behavior have skewed the distribution of fraudulent vs. non-fraudulent transactions in production. Which of the following actions are MOST appropriate to address this issue and improve the model's performance, considering best practices for model retraining within the Snowflake ecosystem?
  • Question 65

    You have deployed a regression model in Snowflake as an external function using AWS Lambda'. The external function takes several numerical features as input and returns a predicted value. You want to continuously monitor the model's performance in production and automatically retrain it when the performance degrades below a predefined threshold. Which of the following methods represent VALID approaches for calculating and monitoring model performance within the Snowflake environment and triggering the retraining process?