Skip to main content

Building Scalable Feature Engineering Pipelines with Snowpark Python and Scikit-Learn

In a typical machine learning (ML) workflow, data science teams develop features in subsets of data and to take models into production, engineering teams are burdened with re-coding feature pipelines in a way that is scalable and reliable. Teams end up with distinct pipelines for training and inference, duplicating work and increasing complexity. 

Using Snowpark, data scientists, data and ML engineering teams can standardize definitions and easily scale processing of ML pipelines for both training and inference.

Watch this webinar to see how to:

  • Use Snowpark Python DataFrames for fast and efficient feature extraction
  • Optimize compute-intensive (aggregations) and memory-intensive (encoding) transformations in Snowflake
  • Streamline ML workflow with training and inference in Snowflake using Anaconda integrated libraries
  • Scale and automate feature engineering for production ML