Question 76
You have built an external function to train a PyTorch model using SageMaker. The model training process requires a significant amount of CPU and memory. The training data is passed from Snowflake to the external function in batches. The external function code in AWS Lambda is as follows:

The Snowflake external function is defined as follows:

During testing, you encounter '500 Internal Server Error' from the external function consistently. Upon inspection of the Lambda logs, you find messages indicating 'PayloadTooLargeError'. What is the most likely cause and how do you mitigate it within the context of Snowflake and AWS Lambda?

The Snowflake external function is defined as follows:

During testing, you encounter '500 Internal Server Error' from the external function consistently. Upon inspection of the Lambda logs, you find messages indicating 'PayloadTooLargeError'. What is the most likely cause and how do you mitigate it within the context of Snowflake and AWS Lambda?
Question 77
You are using Snowflake ML to predict housing prices. You've created a Gradient Boosting Regressor model and want to understand how the 'location' feature (which is categorical, representing different neighborhoods) influences predictions. You generate a Partial Dependence Plot (PDP) for 'location'. The PDP shows significantly different predicted prices for each neighborhood. Which of the following actions would be MOST appropriate to further investigate and improve the model's interpretability and performance?
Question 78
You are a data scientist working for a retail company. You've been tasked with identifying fraudulent transactions. You have a Snowflake table named 'TRANSACTIONS' with columns 'TRANSACTION ID', 'AMOUNT', 'TRANSACTION DATE', 'CUSTOMER ID', and 'LOCATION'. You suspect outliers in transaction amounts might indicate fraud. Which of the following SQL queries is the MOST efficient and appropriate to identify potential outliers using the Interquartile Range (IQR) method, and incorporate necessary data type considerations for robust percentile calculations? Consider also the computational cost associated with each approach on a large dataset.


Question 79
You are a data scientist working for a retail company that stores its transaction data in Snowflake. You need to perform feature engineering on customer purchase history data to build a customer churn prediction model. Which of the following approaches best combines Snowflake's capabilities with a machine learning framework (like scikit-learn) for efficient feature engineering? Assume your data is stored in a table named 'CUSTOMER TRANSACTIONS' with columns like 'CUSTOMER ID, 'TRANSACTION DATE, 'AMOUNT, and 'PRODUCT CATEGORY.
Question 80
You are developing a regression model in Snowflake to predict housing prices. You've trained a model using Snowflake ML functions and now need to rigorously validate its performance. You have a separate validation dataset stored in a table named 'HOUSING VALIDATION'. Which of the following SQL statements, when executed in Snowflake, would accurately calculate the Root Mean Squared Error (RMSE) of your model's predictions against the actual prices in the validation dataset, assuming your model is named 'HOUSING PRICE MODEL' and the prediction function generated by CREATE SNOWFLAKE.ML.FORECAST is called PREDICT?


