A financial institution aims to detect fraudulent transactions using a Supervised Learning model deployed in Snowflake. They have a dataset with transaction details, including amount, timestamp, merchant category, and customer ID. The target variable is 'is_fraudulent' (0 or 1). They are considering different Supervised Learning algorithms. Which of the following algorithms would be MOST suitable for this fraud detection task, considering the need for interpretability, scalability, and the potential for imbalanced classes, and what specific strategies can be employed within Snowflake to handle the class imbalance?
Correct Answer: C
Decision Trees and Random Forests are well-suited for fraud detection due to their ability to handle non-linear relationships and provide interpretability. The class imbalance problem (where fraudulent transactions are much rarer than legitimate ones) is a common challenge in fraud detection. Oversampling the minority class or using techniques like SMOTE within Snowflake before training can significantly improve the model's performance. KNN is not well-suited for high-dimensional data or imbalanced datasets. SVM can be computationally expensive and lacks interpretability. Linear Regression is inappropriate for a classification problem. Naive Bayes makes strong independence assumptions that may not hold in fraud detection scenarios.
Question 57
You are tasked with training a complex machine learning model using scikit-learn and need to leverage Snowflake's data for training outside of Snowflake using an external function. The training data resides in a Snowflake table named 'CUSTOMER DATA'. Due to data governance policies, you must ensure minimal data movement and secure communication. You choose to implement the external function using AWS Lambda'. Which of the following steps are crucial to achieve secure and efficient model training outside of Snowflake?
Correct Answer: A,B,D
Options A, B, and D are correct. An API integration is essential for securely communicating with the external function via API Gateway, offering authentication and authorization. Granting usage privileges ensures only authorized roles can execute the external function. The external function serves as the bridge between Snowflake and the external service. Option C is incorrect because storing Snowflake credentials directly in Lambda environment variables is a security risk. Option E, while good practice, is not strictly related to the external function configuration for training model.
Question 58
You are developing a Python stored procedure in Snowflake to predict sales for a retail company. You want to incorporate external data (e.g., weather forecasts) into your model. Which of the following methods are valid and efficient ways to access and use external data within your Snowflake Python stored procedure?
Correct Answer: A,B,C,E
Options A, B, C and E are all valid methods. Calling external APIs (A) requires network policy configuration. Loading data into tables (B) is a standard approach. Using external functions (C) allows pre-processing. Option D is highly impractical for large or frequently updated datasets. Snowflake Pipes (E) are an effective way to continually ingest external data into Snowflake.
Question 59
You have successfully deployed a machine learning model in Snowflake using Snowpark and are generating predictions. You need to implement a robust error handling mechanism to ensure that if the model encounters an issue during prediction (e.g., missing feature, invalid data type), the process doesn't halt and the errors are logged appropriately. You are using a User-Defined Function (UDF) to call the model. Which of the following strategies, when used IN COMBINATION, provides the BEST error handling and monitoring capabilities in this scenario?
Correct Answer: A,D
The combination of A and D provides the best error handling and monitoring. A 'TRY...CATCH' block within the UDF allows for graceful handling of exceptions and prevents the entire process from failing. Logging errors to a separate Snowflake table allows for easy analysis and debugging. Returning a default value ensures that downstream applications don't encounter unexpected errors due to missing predictions. Snowflake's event tables capture a broader range of errors and audit logs, providing a comprehensive view of the UDF's execution. Option B is insufficient as it relies solely on post-mortem analysis. Option C is useful for performance profiling but doesn't address error handling directly. Option E introduces external dependencies and complexity when a native Snowflake solution is available and potentially introduces latency in the prediction process. It also can impact costs since you are using external function to copy the logs outside snowflake, where cost will be charged.
Question 60
A financial services company wants to predict loan defaults. They have a table 'LOAN APPLICATIONS' with columns 'application_id', applicant_income', 'applicant_age' , and 'loan_amount'. You need to create several derived features to improve model performance. Which of the following derived features, when used in combination, would provide the MOST comprehensive view of an applicant's financial stability and ability to repay the loan? Select all that apply
Correct Answer: A,C,E
The best combination provides diverse perspectives on financial stability. directly reflects the applicant's ability to cover the loan with their income. represents the loan burden relative to the applicant's age and can expose risk in younger, less established applicants. provides the most comprehensive view, including existing debt obligations from external data. "age_squared' and are less directly informative about repayment ability. They could potentially capture non-linear relationships, but 'age_squareff is more likely to introduce overfitting. relies on an external data source, making it a powerful, but potentially more complex, feature to implement.