Free Access Snowflake.DSA-C03.v2025-10-01.q105 Practice Test (Page 3)

Question 6

You are performing exploratory data analysis on a dataset containing customer transaction data in Snowflake. The dataset has a column named 'transaction_amount' and a column named 'customer_segment'. You want to analyze the distribution of transaction amounts for each customer segment using Snowflake's statistical functions. Which of the following approaches would BEST achieve this, providing insights into the central tendency and spread of the data?

A.Option A

B.Option B

C.Option C

D.Option D

E.Option E

Question 7

You are developing a regression model in Snowflake using Snowpark to predict house prices based on features like square footage, number of bedrooms, and location. After training the model, you need to evaluate its performance. Which of the following Snowflake SQL queries, used in conjunction with the model's predictions stored in a table named 'PREDICTED PRICES, would be the most efficient way to calculate the Root Mean Squared Error (RMSE) using Snowflake's built-in functions, given that the actual prices are stored in the 'ACTUAL PRICES' table?

A.Option A

B.Option B

C.Option C

D.Option D

E.Option E

Question 8

You are developing a real-time fraud detection system using Snowflake and an external function. The system involves scoring incoming transactions against a pre-trained TensorFlow model hosted on Google Cloud A1 Platform Prediction. The transaction data resides in a Snowflake stream. The goal is to minimize latency and cost. Which of the following strategies are most effective to optimize the interaction between Snowflake and the Google Cloud A1 Platform Prediction service via an external function, considering both performance and cost?

A.Invoke the external function for each individual transaction in the Snowflake stream, sending the transaction data as a single request to the Google Cloud A1 Platform Prediction service.

B.Implement a caching mechanism within the external function (e.g., using Redis on Google Cloud) to store frequently accessed model predictions, thereby reducing the number of calls to the Google Cloud A1 Platform Prediction service. This requires managing cache invalidation.

C.Batch multiple transactions from the Snowflake stream into a single request to the external function. The external function then sends the batched transactions to the Google Cloud A1 Platform Prediction service in a single request. This increases throughput but might introduce latency.

D.Use a Snowflake pipe to automatically ingest the data from the stream, and then trigger a scheduled task that periodically invokes a stored procedure to train the model externally.

E.Implement asynchronous invocation of the external function from Snowflake using Snowflake's task functionality. This allows Snowflake to continue processing transactions without waiting for the response from the Google Cloud A1 Platform Prediction service, but requires careful monitoring and handling of asynchronous results.

Question 9

A data scientist needs to analyze website session data stored in a Snowflake table named 'WEB SESSIONS'. The table contains columns like 'SESSION D', 'USER_ID, 'PAGE_VIEWS', 'TIME SPENT_SECONDS', and 'TIMESTAMP. They want to identify potential bot traffic by analyzing the correlation between 'PAGE VIEWS' and 'TIME SPENT SECONDS'. Which of the following Snowflake SQL queries is the MOST efficient and statistically sound way to calculate the Pearson correlation coefficient between these two columns, handling potential NULL values appropriately?

A.Option A

B.Option B

C.Option C

D.Option D

E.Option E

Question 10

A data scientist is tasked with predicting customer churn for a telecommunications company using Snowflake. The dataset contains call detail records (CDRs), customer demographic information, and service usage data'. Initial analysis reveals a high degree of multicollinearity between several features, specifically 'total_day_minutes', 'total_eve_minutes', and 'total_night_minutes'. Additionally, the 'state' feature has a large number of distinct values. Which of the following feature engineering techniques would be MOST effective in addressing these issues to improve model performance, considering efficient execution within Snowflake?

A.Apply Principal Component Analysis (PCA) to reduce the dimensionality of the CDR features ('total_day_minutes', 'total_eve_minutes', 'total_night_minutes') and use one-hot encoding for the 'state' feature.

B.Create interaction features by multiplying 'total_day_minutes' with 'customer_service_calls' and applying a target encoding to the 'state' feature.

C.Use a variance threshold to remove highly correlated CDR features and create a feature representing the geographical region (e.g., 'Northeast', 'Southwest') based on the 'state' feature using a custom UDF.

D.Calculate the Variance Inflation Factor (VIF) for each CDR feature and drop the feature with the highest VIE Apply frequency encoding to the 'state' feature.

E.Apply min-max scaling to the CDR features to normalize them and use label encoding for the 'state' feature. Train a decision tree model, as it is robust to multicollinearity.

Latest Upload: 240PaloAltoNetworks.NGFW-Engineer.v2026-05-01.q43; 332Nokia.4A0-113.v2026-05-01.q69; 338EC-COUNCIL.312-49v11.v2026-04-30.q214; 271Microsoft.MB-820.v2026-04-30.q101; 236Salesforce.MC-202.v2026-04-30.q57; 250BICSI.INSTC_V8.v2026-04-29.q53; 368NMLS.MLO.v2026-04-28.q82; 267NCARB.Project-Management.v2026-04-28.q27; 484EMC.D-AV-DY-23.v2026-04-27.q184; 1236ServiceNow.CSA.v2026-04-27.q483

Question 6

Question 7

Question 8

Question 9

Question 10

Download PDF File