Question 96

You are working with a Snowflake table named 'CUSTOMER DATA' containing customer information, including a 'PHONE NUMBER' column. Due to data entry errors, some phone numbers are stored as NULL, while others are present but in various inconsistent formats (e.g., with or without hyphens, parentheses, or country codes). You want to standardize the 'PHONE NUMBER column and replace missing values using Snowpark for Python. You have already created a Snowpark DataFrame called 'customer df representing the 'CUSTOMER DATA' table. Which of the following approaches, used in combination, would be MOST efficient and reliable for both cleaning the existing data and handling future data ingestion, given the need for scalability?
  • Question 97

    You have a Snowflake table 'PRODUCT_PRICES' with columns 'PRODUCT_ID' (INTEGER) and 'PRICE' (VARCHAR). The 'PRICE' column sometimes contains values like '10.50 USD', '20.00 EUR', or 'Invalid Price'. You need to convert the 'PRICE column to a NUMERIC(10,2) data type, removing currency symbols and handling invalid price strings by replacing them with NULL. Considering both data preparation and feature engineering, which combination of Snowpark SQL and Python code snippets achieves this accurately and efficiently, preparing the data for further analysis?
  • Question 98

    You are using the Snowflake Python connector from within a Jupyter Notebook running in VS Code to train a model. You have a Snowflake table named 'CUSTOMER DATA' with columns 'ID', 'FEATURE 1', 'FEATURE_2, and 'TARGET. You want to efficiently load the data into a Pandas DataFrame for model training, minimizing memory usage. Which of the following code snippets is the MOST efficient way to achieve this, assuming you only need 'FEATURE 1', 'FEATURE 2, and 'TARGET' columns?
  • Question 99

    You're building a model to predict whether a user will click on an ad (binary classification: click or no-click) using Snowflake. The data is structured and includes features like user demographics, ad characteristics, and past user interactions. You've trained a logistic regression model using SNOWFLAKE.ML and are now evaluating its performance. You notice that while the overall accuracy is high (around 95%), the model performs poorly at predicting clicks (low recall for the 'click' class). Which of the following steps could you take to diagnose the issue and improve the model's ability to predict clicks, and how would you implement them using Snowflake SQL? SELECT ALL THAT APPLY.
  • Question 100

    You have a binary classification model deployed in Snowflake to predict customer churn. The model outputs a probability score between 0 and 1. You've calculated the following confusion matrix on a holdout set: I I Predicted Positive I Predicted Negative I --1 1 Actual Positive | 80 | 20 | I Actual Negative | 10 | 90 | What are the Precision, Recall, and Accuracy for this model, and what do these metrics tell you about the model's performance? SELECT statement given for true and false condition (True Positive, True Negative, False Positive, False Negative)