Question 101

You are tasked with building a data pipeline using Snowpark Python to process customer feedback data stored in a Snowflake table called FEEDBACK DATA'. This table contains free-text feedback, and you need to clean and prepare this data for sentiment analysis. Specifically, you need to remove stop words, perform stemming, and handle missing values. Which of the following code snippets and strategies, potentially used in conjunction, provide the most effective and performant solution for this task within the Snowpark environment?
  • Question 102

    You have a table 'PRODUCT SALES in Snowflake with columns: 'PRODUCT (INT), 'SALE_DATE (DATE), 'SALES_AMOUNT (FLOAT), and 'PROMOTION FLAG' (BOOLEAN). You need to perform the following data preparation steps using Snowpark SQLAPI:
  • Question 103

    You have trained a complex machine learning model using Snowpark for Python and are now preparing it for production deployment using Snowpark Container Services. You have containerized the model and pushed it to a Snowflake-managed registry. However, you need to ensure that only authorized users can access and deploy this model. Which of the following actions MUST you take to secure your model in the Snowflake Model Registry, ensuring appropriate access control, and minimizing the risk of unauthorized deployment or modification?
  • Question 104

    You are tasked with building a Python stored procedure in Snowflake to train a Gradient Boosting Machine (GBM) model using XGBoost.
    The procedure takes a sample of data from a large table, trains the model, and stores the model in a Snowflake stage. During testing, you notice that the procedure sometimes exceeds the memory limits imposed by Snowflake, causing it to fail. Which of the following techniques can you implement within the Python stored procedure to minimize memory consumption during model training?
  • Question 105

    You are building a data science pipeline in Snowflake to predict customer churn. The pipeline includes a Python UDF that uses a pre- trained scikit-learn model stored as a binary file in a Snowflake stage. The UDF needs to load this model for prediction. You've encountered an issue where the UDF intermittently fails, seemingly related to resource limits when multiple concurrent queries invoke the UDF. Which of the following strategies would best optimize the UDF for concurrency and resource efficiency, minimizing the risk of failure?