Extending Snowflake’s External Functions with Serverless—Adding Driving Times from Mapbox to SQL

December 10, 2020 Tamas Foldi

Data engineers love to use SQL to solve all kinds of data problems. For this and more, Snowflake is a perfect partner. Snowflake’s support for standard SQL and several SQL variations, combined with JavaScript stored procedures, has helped me solve complex data challenges. But sometimes you might have the need for custom code. 

As an example, a few weeks ago, a client reached out wanting to know driving distances and times between its top customers and their warehouses. While Snowflake has HAVERSINE calculation and rich spatial data types, driving distance isn’t a simple geometric calculation: it requires a complex analysis of the road network and expected speeds along each path, and is usually provided by a sophisticated third-party service. I couldn’t use simple Snowflake JavaScript user-defined functions (UDFs) because those can’t call unsafe operations such as network calls. 

This is where Snowflake’s External Functions came in handy. External Functions is a new capability that enables you to write and call your own custom applications or call third-party applications that reside outside of Snowflake. These applications, referred to as remote services, can be written using any HTTP server stack, including cloud serverless compute services, such as AWS Lambda, with any programming language.  

Although the steps to set up an external function are well documented, they can be time consuming because you need to create roles, functions, and relations between your cloud platform and Snowflake instance. You also have to make sure only Snowflake can call your functions, and that they do so in a secure and auditable way. Hence, to make things simpler, I created a Serverless plugin to deploy the required resources for external functions to AWS and Snowflake.

Serverless to the Rescue

Serverless Framework is an open source cloud automation tool for deploying Lambda functions. It helps developers and data engineers focus on the code while hiding the deployment complexities. The following section describes a Serverless plugin that automates deployment of external functions to AWS and Snowflake. This plugin is available for free on github

The high-level process to deploy a Snowflake external function includes four steps: 

1. Create a new Serverless project from a Snowflake template. 

2. Modify the handler code. 

3. Add AWS and Snowflake configuration (such as credentials or region). 

4. Deploy the external function. 

Everything is automated. There’s no need to log into the AWS Management Console or Snowflake UI. 

For example, follow these steps to create a simple hello world function:

1. Create your project template: 

serverless create --template-url https://github.com/starschema/snowflake-aws-external-function -p hello-function 
cd hello-function 
npm install serverless-snowflake-external-function-plugin 

2. Set up the Snowflake and AWS connections. For example, you could change the following lines in Serverless.yml to your actual account information: 

custom:
  snowflake:
    role: ACCOUNTADMIN
    account: ""
    username: ""
    password: ${env:SNOWFLAKE_PASSWORD}
    warehouse: ""
    database: ""
    schema: PUBLIC

The actual code for the function is located in handler.js:

'use strict';

module.exports.hello = async event => {
  const body = JSON.parse(event.body);

  return {
    statusCode: 200,
    body: JSON.stringify(
      {
        data: body.data.map((row) => [row[0], 'hello from lambda'])
      }
    )
  };

  // Use this code if you don't use the http event with the LAMBDA-PROXY integration
  // return { message: 'Go Serverless v1.0! Your function executed successfully!', event };
};

AWS access and secret keys are kept as environment variables (AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY). 

3. Deploy the function to AWS and Snowflake with an sls deploy command. 

$ sls deploy 
Serverless: Packaging service...
Serverless: Excluding development dependencies...
Snowflake: Adding permission to Snowflake for invoking API Gateway
Serverless: Creating Stack...
Serverless: Checking Stack create progress...
........
Serverless: Stack create finished...
Serverless: Uploading CloudFormation file to S3...
Serverless: Uploading artifacts...
Serverless: Uploading service hello-function.zip file to S3 (5.54 MB)...
Serverless: Validating template...
Serverless: Updating Stack...
Serverless: Checking Stack update progress...
..............................
Serverless: Stack update finished...
Service Information
service: hello-function
stage: dev
region: us-west-2
stack: hello-function-dev
resources: 11
api keys:
  None
endpoints:
  POST - https://xsxpsh5f78.execute-api.us-west-2.amazonaws.com/dev/hello
functions:
  hello: hello-function-dev-hello
layers:
  None
Snowflake: Deploying Snowflake API and Functions
Snowflake: Integration HELLO_FUNCTION_DEV successfully created.
Serverless: Run the "serverless" command to setup monitoring, troubleshooting and testing. 

If all went well, you should be able to call your function from the Snowflake console: 

tfoldi#LOAD_WH@TEST_DB.PUBLIC>select hello();
+---------------------+
| HELLO()             |
|---------------------|
| "hello from lambda" |
+---------------------+

Adding the Mapbox API 

This CALCDISTANCE function example uses the Mapbox Directions API, Mapbox’s turn-by-turn navigation REST API. Follow these steps:

1. Register to get an access token. This enables you to retrieve driving distance with a simple node/javascript application. 

2. Add the following JavaScript REST API call to your handler.js and modify the function signature in serverless.yml. This sample Mapbox implementation is just a web call:

const calcDistance = (accessToken, profile, coordinateA, coordinateB) => {
        const mapboxApiURL = `https://api.mapbox.com/directions/v5/mapbox/${profile}/${coordinateA};${coordina teB}?access_token=${accessToken}`
        return axios.get(mapboxApiURL)
            .then(response => {
                    if (response.data && (!response.data.routes || !response.data.routes.length)) {
                        return {
                            errorCode: 5,
                            message: "No route found"
                        }
                    }
                    const routes = response.data.routes[0]
                    return {
                        distance: routes.distance, // metres duration: routes.duration, // seconds } }) }module.exports.calcDistance = async event => { 
                        const body = JSON.parse(event.body);
                        return await Promise.all(
                            body.data.map((row) => calcDistance(process.env.MAPBOX_ACCESS_TOKEN, row[1], row[2], row[3]))
                        ).then((ret) => {
                            return {
                                statusCode: 200,
                                body: JSON.stringify({
                                    data: ret.map((v, idx) => [idx, v])
                                }),
                            }
                        })
                    };

To map this handler to the external function, change the function signature in serverless.yml as shown here: 

functions:
  calc_distance:
    handler: handler.calcDistance

    snowflake:
      argument_signature: (with_profile varchar,coorda varchar,coordb varchar)
      data_type: variant

    events:
      - http:
          path: calc_distance
          method: post 
          authorizer: aws_iam

This definition creates a function variant CALC_DISTANCE (with_profile varchar,coorda varchar,coordb varchar), executing the handler.calcDistance code. Note that while JavaScript UDFs cannot access external web services, external functions can securely call remote endpoints such as web APIs or other external applications. To test your newly deployed function, log into Snowflake and test the function with a simple SELECT statement to determine if driving or cycling is faster.

select calc_distance('driving','-77.058797,38.814374','-77.051435,38.854217');

+------------------------------------------------------------------------
| CALC_DISTANCE('DRIVING','-77.058797,38.814374','-77.051435,38.854217') 
|------------------------------------------------------------------------
| {                                                                     
|   "distance": 5909.54,                                         
|   "duration": 826.267                                               
| }                                                                      
+------------------------------------------------------------------------
select calc_distance('cycling','-77.058797,38.814374','-77.051435,38.854217');
+------------------------------------------------------------------------+
| CALC_DISTANCE('CYCLING','-77.058797,38.814374','-77.051435,38.854217') |
|------------------------------------------------------------------------|
| {                                                                      |
|   "distance": 5474.6,                                                  |
|   "duration": 1411.8                                                   |
| }                                                                      |
+------------------------------------------------------------------------+
1 Row(s) produced. Time Elapsed: 6.275s

The UDF in the example works as expected, returning results with the driving distance and duration in meters and seconds respectively, showing that the bicycle route is shorter but takes a bit more time. These results come from pure SQL, directly from Snowflake.

The full source code of this Mapbox Snowflake function is available on github, here:
https://github.com/tfoldi/snowflake-mapbox-functions

Summary

External Functions is a functionality in Snowflake that enables you to write and call custom applications as user-defined functions from within Snowflake. Unlike traditional UDFs, however, external functions are executed on your own cloud infrastructure, while enabling you to securely access your Snowflake data. By using Serverless’s plugin to deploy your external functions to AWS and Snowflake, you can drastically simplify the deployment process and take advantage of the ability to call custom code and third-party APIs outside of Snowflake.

The post Extending Snowflake’s External Functions with Serverless—Adding Driving Times from Mapbox to SQL appeared first on Snowflake.

Previous Article
Data Lake Export Public Preview Is Now Available on Snowflake
Data Lake Export Public Preview Is Now Available on Snowflake

Public preview of the data lake export feature is now available. Snowflake announced a private preview of d...

Next Article
Data Vault Automation with erwin and Snowflake: Building and Automating a Scalable Data Warehouse Based on Data Vault 2.0
Data Vault Automation with erwin and Snowflake: Building and Automating a Scalable Data Warehouse Based on Data Vault 2.0

The Data Vault methodology can be applied to almost any data store and populated by almost any ETL or ELT d...

×

Subscribe to email updates from the Snowflake Blog

You're subscribed!
Error - something went wrong!