Free Access Snowflake.DSA-C03.v2025-10-01.q105 Practice Test (Page 12)

Question 51

You are tasked with preparing a Snowflake table named 'PRODUCT REVIEWS' for sentiment analysis. This table contains columns like 'REVIEW ID, 'PRODUCT ID', 'REVIEW TEXT', 'RATING', and 'TIMESTAMP'. Your goal is to remove irrelevant fields to optimize model training. Which of the following options represent valid and effective strategies, using Snowpark SQL, for identifying and removing irrelevant or problematic fields from the 'PRODUCT REVIEWS' table, considering both storage efficiency and model accuracy? Assume that the model only need review text and review id and the rating.

A.Using 'ALTER TABLE DROP COLUMN' to directly remove 'TIMESTAMP column, which is deemed irrelevant for the sentiment analysis model. SQL: 'ALTER TABLE PRODUCT REVIEWS DROP COLUMN TIMESTAMP;'

B.Creating a VIEW that only selects the 'REVIEW _ TEXT , 'REVIEW_ID', and 'RATING' columns, effectively hiding the irrelevant columns from the model. SQL: 'CREATE OR REPLACE VIEW REVIEWS FOR ANALYSIS AS SELECT REVIEW TEXT, REVIEW ID, RATING FROM PRODUCT REVIEWS;'

C.creating a new table 'REVIEWS_CLEANED containing only the relevant columns CREVIEW_TEXT , 'REVIEW_ID' , and 'RATING') using 'CREATE TABLE AS SELECT. SQL: 'CREATE OR REPLACE TABLE REVIEWS CLEANED AS SELECT REVIEW TEXT, REVIEW ID, RATING FROM PRODUCT REVIEWS;'

D.Dropping rows with 'NULL' values in REVIEW_TEXT and then dropping the 'PRODUCT_ID' and 'TIMESTAMP' columns using 'ALTER TABLE. SQL: 'CREATE OR REPLACE TABLE PRODUCT REVIEWS AS SELECT FROM PRODUCT REVIEWS WHERE REVIEW TEXT IS NOT NULL; ALTER TABLE PRODUCT REVIEWS DROP COLUMN PRODUCT ID; ALTER TABLE PRODUCT REVIEWS DROP COLUMN TIMESTAMP;'

E.All of the above.

Question 52

You are tasked with building a data science pipeline in Snowflake to predict customer churn. You have trained a scikit-learn model and want to deploy it using a Python UDTF for real-time predictions. The model expects a specific feature vector format. You've defined a UDTF named 'PREDICT CHURN' that loads the model and makes predictions. However, when you call the UDTF with data from a table, you encounter inconsistent prediction results across different rows, even when the input features seem identical. Which of the following are the most likely reasons for this behavior and how would you address them?

A.The scikit-learn model was not properly serialized and deserialized within the UDTF. Ensure the model is saved using 'joblib' or 'pickle' with appropriate settings for cross-platform compatibility and loaded correctly within the UDTF's 'process' method. Verify serialization/deserialization by testing it independently from Snowflake first.

B.The issue is related to the immutability of the Snowflake execution environment for UDTFs. To resolve this, cache the loaded model instance within the UDTF's constructor and reuse it for subsequent predictions. Using a global variable is also acceptable.

C.The input feature data types in the table do not match the expected data types by the scikit-learn model. Cast the input columns to the correct data types (e.g., FLOAT, INT) before passing them to the UDTF. Use explicit casting functions like 'TO DOUBLE and INTEGER in your SQL query.

D.The UDTF is not partitioning data correctly. Ensure the UDTF utilizes the 'PARTITION BY clause in your SQL query based on a relevant dimension (e.g., 'customer_id') to prevent state inconsistencies across partitions. This will isolate the impact of any statefulness within the function

E.There may be an error in model, where the 'predict method is producing different ouputs for the same inputs. Retraining the model will resolve the issue.

Question 53

A data scientist is tasked with identifying customer segments for a new marketing campaign using transaction data stored in Snowflake. The transaction data includes features like transaction amount, frequency, recency, and product category. Which unsupervised learning algorithm would be MOST appropriate for this task, considering scalability and Snowflake's data processing capabilities, and what preprocessing steps are crucial before applying the algorithm?

A.K-Means clustering, after standardizing numerical features (transaction amount, frequency, recency) and using one-hot encoding for product category. This is highly scalable within Snowflake using UDFs and SQL.

B.Hierarchical clustering, using the complete linkage method and Euclidean distance. No preprocessing is necessary, as hierarchical clustering can handle raw data.

C.DBSCAN, using raw data without any scaling or encoding. The algorithm's density-based nature will automatically handle the varying scales of the features.

D.Principal Component Analysis (PCA) followed by K-Means. This reduces dimensionality and then clusters, improving the visualization of the cluster.

E.K-Means clustering, after applying min-max scaling to numerical features and converting categorical features to numerical representation. The optimal 'k' (number of clusters) should be determined using the elbow method or silhouette analysis.

Question 54

You have a Snowflake Model Registry set up and are managing multiple versions of a machine learning model. You want to programmatically retrieve a specific version of the model and load it for inference within a Snowflake Snowpark Python UDE Assume your registry name is 'my_registry', the model name is 'credit risk_model', and you want to retrieve version 'v2'. How would you achieve this using Snowpark Python?

A.Option A

B.Option B

C.Option C

D.Option D

E.Option E

Question 55

You are tasked with developing a Snowpark Python function to identify and remove near-duplicate text entries from a table named 'PRODUCT DESCRIPTIONS. The table contains a 'PRODUCT ONT) and 'DESCRIPTION' (STRING) column. Near duplicates are defined as descriptions with a Jaccard similarity score greater than 0.9. You need to implement this using Snowpark and UDFs. Which of the following approaches is most efficient, secure, and correct to implement?

A.Define a Python UDF that calculates the Jaccard similarity between all pairs of descriptions in the table. Use a cross join to compare all rows, then filter based on the Jaccard similarity threshold. Finally, delete the near-duplicate rows based on a chosen tie-breaker (e.g., smallest PRODUCT_ID).

B.Define a Python UDF that calculates the Jaccard similarity. Create a new table, 'PRODUCT DESCRIPTIONS NO DUPES , and insert the distinct descriptions based on the similarity score. Rows in the original table with similar product description must be inserted with lowest product id into new table.

C.Define a Python UDF that calculates the Jaccard similarity. Use 'GROUP BY to group descriptions by the 'PRODUCT ID. Apply the UDF on this grouped data to remove duplicates with similarity score greater than threshold.

D.Define a Python UDF to calculate Jaccard similarity. Create a temporary table with a ROW NUMBER() column partitioned by a hash of the DESCRIPTION column. Calculate the Jaccard similarity between descriptions within each partition. Filter and remove near duplicates based on a tie-breaker (smallest PRODUCT_ID).

E.Use the function directly in a SQL query without a UDF. Partition the data by 'PRODUCT_ID' and remove near duplicates where the approximate Jaccard index is above 0.9.

Latest Upload: 201PaloAltoNetworks.NGFW-Engineer.v2026-05-01.q43; 296Nokia.4A0-113.v2026-05-01.q69; 253EC-COUNCIL.312-49v11.v2026-04-30.q214; 228Microsoft.MB-820.v2026-04-30.q101; 209Salesforce.MC-202.v2026-04-30.q57; 204BICSI.INSTC_V8.v2026-04-29.q53; 333NMLS.MLO.v2026-04-28.q82; 241NCARB.Project-Management.v2026-04-28.q27; 458EMC.D-AV-DY-23.v2026-04-27.q184; 1112ServiceNow.CSA.v2026-04-27.q483

Question 51

Question 52

Question 53

Question 54

Question 55

Download PDF File