Question 66

You are tasked with identifying fraudulent transactions from unstructured log data stored in Snowflake. The logs contain various fields, including timestamps, user IDs, and transaction details embedded within free-text descriptions. You plan to use a supervised learning approach, having labeled a subset of transactions as 'fraudulent' or 'not fraudulent.' Which of the following methods best describes the extraction and processing of this data for training a machine learning model within Snowflake?
  • Question 67

    You are building a fraud detection model using transaction data stored in Snowflake. The dataset includes features like transaction amount, merchant category, location, and time. Due to regulatory requirements, you need to ensure personally identifiable information (PII) is handled securely and compliantly during the data collection and preprocessing phases. Which of the following combinations of Snowflake features and techniques would be MOST suitable for achieving this goal?
  • Question 68

    You are working on a customer churn prediction model and are using Snowpark Feature Store. One of your features, is updated daily. You notice that your model's performance degrades over time, likely due to stale feature values being used during inference. You want to ensure that the model always uses the most up-to-date feature values. Which of the following strategies would be the MOST effective way to address this issue using Snowpark Feature Store and avoid model staleness during online inference?
  • Question 69

    A data engineer is tasked with removing duplicates from a table named 'USER ACTIVITY' in Snowflake, which contains user activity logs. The table has columns: 'ACTIVITY TIMESTAMP', 'ACTIVITY TYPE', and 'DEVICE_ID. The data engineer wants to remove duplicate rows, considering only 'USER ID', 'ACTIVITY TYPE, and 'DEVICE_ID' columns. What is the most efficient and correct SQL query to achieve this while retaining only the earliest 'ACTIVITY TIMESTAMP' for each unique combination of the specified columns?
  • Question 70

    You are working with a dataset containing customer reviews for various products. The dataset includes a 'REVIEW TEXT column with the raw review text and a 'PRODUCT ID' column. You want to perform sentiment analysis on the reviews and create a new feature called 'SENTIMENT SCORE for each product. You plan to use a UDF to perform the sentiment analysis. Which of the following steps and SQL code snippets are essential for implementing this feature engineering task in Snowflake, ensuring optimal performance and scalability? Select all that apply: