Free Access Snowflake.DSA-C03.v2025-10-01.q105 Practice Test (Page 22)

Question 101

You are tasked with building a data pipeline using Snowpark Python to process customer feedback data stored in a Snowflake table called FEEDBACK DATA'. This table contains free-text feedback, and you need to clean and prepare this data for sentiment analysis. Specifically, you need to remove stop words, perform stemming, and handle missing values. Which of the following code snippets and strategies, potentially used in conjunction, provide the most effective and performant solution for this task within the Snowpark environment?

A.Use a Python UDF that utilizes the NLTK library to remove stop words and perform stemming on the feedback text. Handle missing values by replacing them with an empty string using the .fillna(")' method on the Snowpark DataFrame after applying the UDF.

B.Leverage Snowflake's built-in string functions within SQL to remove common stop words based on a predefined list. Use a Snowpark DataFrame to execute this SQL transformation. For stemming, research and deploy a Java UDF implementing stemming algorithms, then chain it within a Snowpark transformation pipeline. Replace missing values with the string 'N/A' during the DataFrame construction using 'na.fill('N/A')'.

C.Utilize Snowpark's 'call_function' with a Java UDF pre-loaded into Snowflake, which removes stop words and performs stemming with libraries like Lucene. Missing values can be handled with SQL's 'NVL' function during the initial data extraction into a Snowpark DataFrame.

D.Load the FEEDBACK DATA' table into a Pandas DataFrame using perform stop word removal and stemming using libraries like spacy or NLTK, handle missing values using Pandas' 'fillna()' method. Then, convert the cleaned Pandas DataFrame back into a Snowpark DataFrame. Use vectorization of text column in dataframe after above step

E.Implement all data cleaning tasks within a single SQL stored procedure including removing stop words using REPLACE functions, stemming using a custom lookup table, and handling NULL values using COALESC Call this stored procedure from Snowpark for Python.

Question 102

You have a table 'PRODUCT SALES in Snowflake with columns: 'PRODUCT (INT), 'SALE_DATE (DATE), 'SALES_AMOUNT (FLOAT), and 'PROMOTION FLAG' (BOOLEAN). You need to perform the following data preparation steps using Snowpark SQLAPI:

A.Handling missing 'SALES_AMOUNT values by imputing them with the average 'SALES_AMOUNT' for the same 'PRODUCT_ID during the previous month. If there's no data for the previous month, use the overall average for that

B.Converting 'SALE_DATE to a quarterly representation (e.g., '2023-QI').

C.Creating a new feature representing the percentage change in 'SALES_AMOUNT compared to the previous day for the same 'PRODUCT_ID. Handle the first day of each 'PRODUCT by setting 'SALES_GROWTH' to O.

D.Creating a feature that returns 1 if there is a PROMOTION_FLAG of True and SALES_AMOUNT > 1000, and zero otherwise

E.All of the above.

Question 103

You have trained a complex machine learning model using Snowpark for Python and are now preparing it for production deployment using Snowpark Container Services. You have containerized the model and pushed it to a Snowflake-managed registry. However, you need to ensure that only authorized users can access and deploy this model. Which of the following actions MUST you take to secure your model in the Snowflake Model Registry, ensuring appropriate access control, and minimizing the risk of unauthorized deployment or modification?

A.Grant the 'USAGE privilege on the stage where the model files are stored to all users who need to deploy the model.

B.Grant the 'READ privilege on the container registry to all users who need to deploy the model. Create a custom role with the 'APPLY MASKING POLICY privilege and grant this role to the deployment team.

C.Grant the 'USAGE privilege on the database and schema containing the model registry, grant the 'READ privilege on the registry itself, and grant the EXECUTE TASK' privilege to the deployment team for the deployment task.

D.Create a custom role, grant the USAGE' privilege on the database and schema containing the model registry, grant the 'READ privilege on the registry, and then grant this custom role to only those users authorized to deploy the model. Consider masking sensitive model parameters using masking policies.

E.Store the model outside of Snowflake managed registry and use external authentication to control access.

Question 104

You are tasked with building a Python stored procedure in Snowflake to train a Gradient Boosting Machine (GBM) model using XGBoost.
The procedure takes a sample of data from a large table, trains the model, and stores the model in a Snowflake stage. During testing, you notice that the procedure sometimes exceeds the memory limits imposed by Snowflake, causing it to fail. Which of the following techniques can you implement within the Python stored procedure to minimize memory consumption during model training?

A.Convert the Pandas DataFrame used for training to a Dask DataFrame and utilize Dask's distributed processing capabilities to train the XGBoost model in parallel across multiple Snowflake virtual warehouses.

B.Use the 'hist' tree method in XGBoost, enable gradient-based sampling ('gosS), and carefully tune the 'max_depth' and parameters to reduce memory usage during tree construction. Convert all features to numerical if possible.

C.Reduce the sample size of the training data and increase the number of boosting rounds to compensate for the smaller sample. Use the 'predict_proba' method to avoid storing probabilities for all classes.

D.Write the training data to a temporary table in Snowflake, then use Snowflake's external functions to train the XGBoost model on a separate compute cluster outside of Snowflake. Then upload the model to snowflake stage.

E.Implement XGBoost's 'early stopping' functionality with a validation set to prevent overfitting. If the stored procedure exceeds the memory limits, the model cannot be saved. Always use larger virtual warehouse.

Question 105

You are building a data science pipeline in Snowflake to predict customer churn. The pipeline includes a Python UDF that uses a pre- trained scikit-learn model stored as a binary file in a Snowflake stage. The UDF needs to load this model for prediction. You've encountered an issue where the UDF intermittently fails, seemingly related to resource limits when multiple concurrent queries invoke the UDF. Which of the following strategies would best optimize the UDF for concurrency and resource efficiency, minimizing the risk of failure?

A.Load the scikit-learn model inside the UDF function on every invocation to ensure the latest version is used.

B.Load the scikit-learn model outside the UDF function in the global scope of the module so that all invocations share the same loaded model instance. Use the 'context.getExecutionContext(Y to track execution, making sure it is thread safe.

C.Utilize Snowflake's session-level caching by storing the loaded model in 'session.get('model')' to be reused across multiple UDF calls within the same session. Reload the model if 'session.get('model')' is None.

D.Implement a global, lazy-loaded cache for the scikit-learn model within the UDF's module. The model is loaded only once during the first invocation and shared across subsequent calls. Protect the loading process with a lock to prevent race conditions in concurrent environments.

E.Increase the memory allocated to the Snowflake warehouse to accommodate multiple UDF invocations.

Latest Upload: 202PaloAltoNetworks.NGFW-Engineer.v2026-05-01.q43; 301Nokia.4A0-113.v2026-05-01.q69; 258EC-COUNCIL.312-49v11.v2026-04-30.q214; 229Microsoft.MB-820.v2026-04-30.q101; 212Salesforce.MC-202.v2026-04-30.q57; 206BICSI.INSTC_V8.v2026-04-29.q53; 336NMLS.MLO.v2026-04-28.q82; 243NCARB.Project-Management.v2026-04-28.q27; 465EMC.D-AV-DY-23.v2026-04-27.q184; 1120ServiceNow.CSA.v2026-04-27.q483

Question 101

Question 102

Question 103

Question 104

Question 105

Download PDF File