Free Access Snowflake.DSA-C03.v2025-10-01.q105 Practice Test (Page 15)

Question 66

You are tasked with identifying fraudulent transactions from unstructured log data stored in Snowflake. The logs contain various fields, including timestamps, user IDs, and transaction details embedded within free-text descriptions. You plan to use a supervised learning approach, having labeled a subset of transactions as 'fraudulent' or 'not fraudulent.' Which of the following methods best describes the extraction and processing of this data for training a machine learning model within Snowflake?

A.Use regular expressions within a Snowflake UDF to extract relevant information (e.g., amount, item description) from the log descriptions. Convert extracted data into numerical features using one-hot encoding within the UDF. Then, train a model using the extracted numerical features directly within Snowflake using SQL extensions for machine learning.

B.Extract the entire log description field and train a word embedding model (e.g., Word2Vec) on the entire dataset. Average the word vectors for each transaction's log description to create a document vector. Train a classification model (e.g., Random Forest) on these document vectors within Snowflake.

C.Use a combination of regular expressions and natural language processing (NLP) techniques within Snowflake UDFs to extract key features such as transaction amounts, product categories, and sentiment scores from the log descriptions. Then, combine these extracted features with other structured data (e.g., user demographics) and train a classification model using these features. The NLP steps include tokenization, stop word removal, and TF-IDF vectorization.

D.Export the entire log data to an external machine learning platform (e.g., AWS SageMaker) and perform feature extraction, NLP processing, and model training there. Import the trained model back into Snowflake as a UDF for prediction.

E.Treat the unstructured log description as a categorical feature and directly apply one-hot encoding within Snowflake, then train a classification model. Due to high dimensionality perform PCA for dimensionality reduction before training.

Question 67

You are building a fraud detection model using transaction data stored in Snowflake. The dataset includes features like transaction amount, merchant category, location, and time. Due to regulatory requirements, you need to ensure personally identifiable information (PII) is handled securely and compliantly during the data collection and preprocessing phases. Which of the following combinations of Snowflake features and techniques would be MOST suitable for achieving this goal?

A.Use Snowflake's masking policies to redact PII columns before any data is accessed for model training. Ensure role-based access control is configured so that only authorized personnel can access the unmasked data for specific purposes.

B.Create a view that selects only the non-PII columns for model training. Grant access to this view to the data science team.

C.Encrypt the entire database containing the transaction data to protect PII from unauthorized access.

D.Use Snowflake's data sharing capabilities to share the transaction data with a third-party machine learning platform for model development, without any PII masking or redaction.

E.Apply differential privacy techniques on aggregated data derived from the transaction data, before using it for model training. Combine this with Snowflake's row access policies to restrict access to sensitive transaction records based on user roles and data attributes.

Question 68

You are working on a customer churn prediction model and are using Snowpark Feature Store. One of your features, is updated daily. You notice that your model's performance degrades over time, likely due to stale feature values being used during inference. You want to ensure that the model always uses the most up-to-date feature values. Which of the following strategies would be the MOST effective way to address this issue using Snowpark Feature Store and avoid model staleness during online inference?

A.Configure the Feature Group containing to automatically refresh every hour using a scheduled Snowpark Python function.

B.Implement a real-time feature retrieval service that directly queries the underlying Snowflake table containing the using Snowpark, bypassing the Feature Store.

C.Use the method on the Feature Store client during inference, ensuring that you always pass the current timestamp.

D.Define a custom User-Defined Function (UDF) in Snowflake that retrieves the 'customer_lifetime_value' from the Feature Store on demand whenever the model makes a prediction and set 'feature_retrieval_mode='fresh'S.

E.Configure with the attribute to manage data staleness and use the during inference, ensuring that the model always uses recent feature values.

Question 69

A data engineer is tasked with removing duplicates from a table named 'USER ACTIVITY' in Snowflake, which contains user activity logs. The table has columns: 'ACTIVITY TIMESTAMP', 'ACTIVITY TYPE', and 'DEVICE_ID. The data engineer wants to remove duplicate rows, considering only 'USER ID', 'ACTIVITY TYPE, and 'DEVICE_ID' columns. What is the most efficient and correct SQL query to achieve this while retaining only the earliest 'ACTIVITY TIMESTAMP' for each unique combination of the specified columns?

A.Option A

B.Option B

C.Option C

D.Option D

E.Option E

Question 70

You are working with a dataset containing customer reviews for various products. The dataset includes a 'REVIEW TEXT column with the raw review text and a 'PRODUCT ID' column. You want to perform sentiment analysis on the reviews and create a new feature called 'SENTIMENT SCORE for each product. You plan to use a UDF to perform the sentiment analysis. Which of the following steps and SQL code snippets are essential for implementing this feature engineering task in Snowflake, ensuring optimal performance and scalability? Select all that apply:

A.Create a Python UDF that takes the 'REVIEW_TEXT as input and returns a sentiment score (e.g., between -1 and 1). Then, use 'CREATE OR REPLACE FUNCTION' statement to register the UDF.

B.Use the 'SNOWFLAKE.ML' package to train a sentiment analysis model directly within Snowflake, eliminating the need for a separate UDF.

C.Apply the sentiment analysis UDF to the 'REVIEW TEXT column within a 'SELECT statement, grouping by 'PRODUCT ID and calculating the average 'SENTIMENT_SCORE' using

D.Cache the results of the sentiment analysis UDF in a temporary table to avoid recomputing the scores for the same reviews in subsequent queries. Use 'CREATE TEMPORARY TABLE to create a temporary table.

E.Ensure the UDF is vectorized to process batches of reviews at once, improving performance. This can be achieved using decorator on top of the python function.

Latest Upload: 200PaloAltoNetworks.NGFW-Engineer.v2026-05-01.q43; 292Nokia.4A0-113.v2026-05-01.q69; 251EC-COUNCIL.312-49v11.v2026-04-30.q214; 227Microsoft.MB-820.v2026-04-30.q101; 207Salesforce.MC-202.v2026-04-30.q57; 204BICSI.INSTC_V8.v2026-04-29.q53; 332NMLS.MLO.v2026-04-28.q82; 241NCARB.Project-Management.v2026-04-28.q27; 457EMC.D-AV-DY-23.v2026-04-27.q184; 1109ServiceNow.CSA.v2026-04-27.q483

Question 66

Question 67

Question 68

Question 69

Question 70

Download PDF File