Free Access Snowflake.DSA-C03.v2025-10-01.q105 Practice Test (Page 17)

Question 76

You have built an external function to train a PyTorch model using SageMaker. The model training process requires a significant amount of CPU and memory. The training data is passed from Snowflake to the external function in batches. The external function code in AWS Lambda is as follows:

The Snowflake external function is defined as follows:

During testing, you encounter '500 Internal Server Error' from the external function consistently. Upon inspection of the Lambda logs, you find messages indicating 'PayloadTooLargeError'. What is the most likely cause and how do you mitigate it within the context of Snowflake and AWS Lambda?

A.The size of the data being sent from Snowflake to the Lambda function exceeds the maximum payload size allowed by AWSAPI Gateway. Increase the maximum payload size limit in the API Gateway settings.

B.The Lambda function is timing out before the model training can complete. Increase the Lambda function's timeout setting to allow sufficient time for the training process.

C.The Snowflake external function definition is incorrect. Change the 'RETURNS VARIANT clause to 'RETURNS VARCHAR as the Lambda function returns a JSON string.

D.The size of the data being sent from Snowflake to the Lambda function exceeds the maximum payload size allowed by AWS API Gateway. Implement data partitioning in Snowflake and send smaller batches of data to the Lambda function, aggregating the results in a separate table.

E.The IAM role associated with the Lambda function lacks the necessary permissions to invoke the SageMaker training job. Grant the Lambda function's IAM role the appropriate SageMaker permissions.

Question 77

You are using Snowflake ML to predict housing prices. You've created a Gradient Boosting Regressor model and want to understand how the 'location' feature (which is categorical, representing different neighborhoods) influences predictions. You generate a Partial Dependence Plot (PDP) for 'location'. The PDP shows significantly different predicted prices for each neighborhood. Which of the following actions would be MOST appropriate to further investigate and improve the model's interpretability and performance?

A.Remove the 'location' feature from the model, as categorical features are inherently difficult to interpret.

B.Use one-hot encoding for the 'location' feature and generate individual PDPs for each one-hot encoded column.

C.Replace the 'location' feature with a numerical feature representing the average house price in each neighborhood, calculated from historical data.

D.Generate ICE (Individual Conditional Expectation) plots alongside the PDP to assess the heterogeneity of the relationship between 'location' and predicted price.

E.Combine the PDP for 'location' with a two-way PDP showing the interaction between 'location' and 'square_footage'.

Question 78

You are a data scientist working for a retail company. You've been tasked with identifying fraudulent transactions. You have a Snowflake table named 'TRANSACTIONS' with columns 'TRANSACTION ID', 'AMOUNT', 'TRANSACTION DATE', 'CUSTOMER ID', and 'LOCATION'. You suspect outliers in transaction amounts might indicate fraud. Which of the following SQL queries is the MOST efficient and appropriate to identify potential outliers using the Interquartile Range (IQR) method, and incorporate necessary data type considerations for robust percentile calculations? Consider also the computational cost associated with each approach on a large dataset.

A.Option A

B.Option B

C.Option C

D.Option D

E.Option E

Question 79

You are a data scientist working for a retail company that stores its transaction data in Snowflake. You need to perform feature engineering on customer purchase history data to build a customer churn prediction model. Which of the following approaches best combines Snowflake's capabilities with a machine learning framework (like scikit-learn) for efficient feature engineering? Assume your data is stored in a table named 'CUSTOMER TRANSACTIONS' with columns like 'CUSTOMER ID, 'TRANSACTION DATE, 'AMOUNT, and 'PRODUCT CATEGORY.

A.Extract all the data from 'CUSTOMER_TRANSACTIONS' into a Pandas DataFrame, perform feature engineering using Pandas and scikit-learn, and then load the processed data back into Snowflake.

B.Use Snowflake's SQL UDFs (User-Defined Functions) written in Python to perform feature engineering directly within Snowflake on smaller aggregated sets of data to optimize compute costs. Integrate these UDFs to query the entire 'CUSTOMER TRANSACTIONS table to build your features.

C.Create a Snowflake external function that calls a cloud-based (AWS, Azure, GCP) machine learning service for feature engineering, passing the raw transaction data for each customer and processing the aggregated data into features in Snowflake SQL.

D.Develop a custom Spark application to read data from Snowflake, perform feature engineering in Spark, and write the resulting features back to a new table in Snowflake, and avoid use of Snowflake SQL UDFs to minimize complexity.

E.Load a small subset of 'CUSTOMER_TRANSACTIONS' into an in-memory database like Redis, perform feature engineering using custom Python scripts interacting with Redis, and periodically sync the results back to Snowflake.

Question 80

You are developing a regression model in Snowflake to predict housing prices. You've trained a model using Snowflake ML functions and now need to rigorously validate its performance. You have a separate validation dataset stored in a table named 'HOUSING VALIDATION'. Which of the following SQL statements, when executed in Snowflake, would accurately calculate the Root Mean Squared Error (RMSE) of your model's predictions against the actual prices in the validation dataset, assuming your model is named 'HOUSING PRICE MODEL' and the prediction function generated by CREATE SNOWFLAKE.ML.FORECAST is called PREDICT?