This is the 3rd blog in our series on Snowflake Resource Optimization.
In parts 1 and 2 of this blog series, we showed you how Snowflake’s unique architecture allows for a virtually unlimited number of compute resources to be accessed near-instantaneously. We also provided best practices for administering these compute resources to optimize performance and reduce credit consumption. In this blog post, I’ll show you the best practices our internal Snowflake team uses to manage our own usage of Snowflake.
My name is Tamir Rozenberg and I support one of the world’s largest Snowflake deployments – it just so happens to be the one we have here internally at Snowflake. Prior to Snowflake, I managed Instacart’s Snowflake platform. In the 4 years I’ve been overseeing Snowflake resources, I’ve developed several strategies that have helped me implement highly efficient and performant deployments. I’ll share those strategies with you in this post.
When we talk about the multi-cluster compute layer of Snowflake, we’re referring to the virtual warehouses that execute queries on the centralized storage layer. Below is the architectural diagram that shows the 3 components of the Snowflake platform.
Snowflake Virtual Warehouses
A virtual warehouse, often referred to simply as a “warehouse,” is a cluster of compute resources in Snowflake. A warehouse provides the required resources, such as CPU, memory, and temporary storage, to perform SELECT, UPDATE, DELETE, and INSERT commands.
Snowflake’s Management Flexibility
Snowflake provides powerful features and control capabilities that can help you determine the best approach for warehouse management based on your organization’s use cases. Each organization can enforce its own strategy, and the strategy can be completely different for each organization.
The information below provides a few recommendations for how to set up your warehouses based on your use cases.
Understanding Workload Management
Snowflake’s unique architecture makes it possible to run multiple workloads concurrently, without performance impact, by using more than one concurrently running warehouse. Each workload, for example, an ETL workload and an analytics workload, can have its own isolated resources even while each is operating on the same databases and tables. That means the ETL workload does not impact the performance of the analytics workload and vice versa.
To get most of the value out of Snowflake virtual warehouses, it is important to use the correct warehouse for the work you intend to do, and it’s also important to not mix multiple workloads in a single warehouse.
For example, a heavy batch operation workload should not use the warehouses designated for ad hoc queries and analytics.
Analyzing Your Workloads
It is good practice to classify your workloads before you choose a warehouse size and configuration.
Here are a few common types of workloads:
- Ad-hoc Analytics: Self-service Analytics or ad-hoc heavy queries
- Data Loading: A COPY command that constantly loads data from an external data source
- Data transformations: A series of commands to transform your raw data into a more consumable format
- Reporting: Dashboards and other reports refreshed on a schedule, or on-demand by executives
- Applications: End-user applications that are displaying data based on query results
- Batch: A massive batch transformation that runs frequently
I used the following table to help me map various workloads to an appropriate warehouse size (XL, L, M, or S) based on the following criteria:
- Frequency: How often the workload runs
- Concurrency: How many processes will run simultaneously
- Scan Size: What’s the average size of data scan the query will perform
- Copy Files: How many files will be copied simultaneously
- SLA Minutes: The number of minutes stipulated by the SLA
Here is an example of how I would assign various workloads to warehouses based on these categories.
|Workload Description||Workload Type||Frequency||Concurrency||Scan Size||Copy Files||SLA Minutes||Warehouse Name||Warehouse Size|
|Data Transformation||Batch||Hourly||1||1 TB||10||transform_wh||XL|
|Third-Party Data Ingestion||Loader||Hourly||2||2||5||load_wh||XS (see this tip)|
|Tableau||Ad-Hoc Analytics||Ad hoc||8||2 TB||1||analytics_wh||XL (see this tip)|
|Alerts, Monitoring…||Batch||Hourly||2||500 GB||1||app_wh||M|
|Console||Applications||On demand||16||1 TB||0.1||application_wh||XL|
|Snowflake UI||Ad-Hoc Analytics||Ad hoc||8||3 TB||1||analytics_wh||XL|
|Amazon S3 Loader||Loader||Hourly||5||5||5||load_wh||XS|
|Database, Machine Learning, or Python Connector||Batch||Hourly||2||5 TB||5||transform_wh||XL|
From the table above, we can consolidate multiple workloads under one warehouse. When using this strategy, try to categorize your workloads to warehouses and combine all identical workloads to the same warehouse until you maximize warehouse utilization. The “Warehouse Utilization” section below provides more information.
Centralize Requests for Resources
As a best practice, I recommend delegating the responsibility of scaling warehouses up and down to just a few members of your team. Create a dedicated role with permissions to operate on all warehouses and grant that role to a limited number of engineers. This will allow you to control your policy and strategy for warehouse utilization.
Below are some specific things my team and I do when we setup new warehouses. These are useful if you are migrating from another data platform and need to configure your Snowflake resources but also for adding new workloads. Remember to focus on the business problem you’re trying to tackle, identify the workloads, then decide how to implement that into action with Snowflake resources.
Configuring a Warehouse
Configuring a warehouse is done by the warehouse administrators.
- Size: If you are not sure what is the best size for the warehouse, start with a smaller size and slowly increase it based on the workload’s performance and its SLA.
- Maximum clusters: Similar to size, start with a small number and increase it based on workload activities (available only in enterprise edition or above).
- Scaling policy: Usually this is set to Standard but if queuing is not an issue for the workload, set it to Economy, which will conserve credits by trying to keep running clusters fully loaded (available only in enterprise edition or above).
- Auto-suspend: This specifies the number of seconds of inactivity after which a warehouse is automatically suspended. Note that the default value is 600 seconds, meaning the warehouse suspends automatically after 10 minutes of inactivity. If your workload runs infrequently, to reduce your cost, set the value to 60 seconds (one minute), as shown in the following example. Setting the value to NULL is not recommended unless your query workloads require a continually running warehouse. Setting the value to NULL can result in a significant consumption of credits (and corresponding charges), particularly for larger warehouses.
Alter warehouse warehouse_name set AUTO_SUSPEND=60
Auto-resume: Rarely disable this option; if you do, the warehouse will resume only when you explicitly run the command alter warehouse warehouse_name resume. If the option is enabled, the warehouse will resume automatically when a new query is submitted.
Setting the Warehouses’ Timeout
For each warehouse, set the applicable timeout based on the workload that is expected. For example, set the parameter STATEMENT_TIMEOUT_IN_SECONDS to 1,000 seconds if the expected time for the warehouse to complete the workload is less than 1,000 seconds. Depending on your use case consider also setting up the parameter STATEMENT_QUEUED_TIMEOUT_IN_SECONDS for warehouses that process ad-hoc queries workload.
Setting the correct timeout will prevent mixing different workloads into one warehouse.
The following screenshot shows the output of warehouse parameters.
Warehouse Naming Conventions
Try to provide a relatively short but descriptive name for each warehouse. For example loader_wh or tableau_wh are great examples of good warehouse names that easily can be mapped to a workload.
A poor approach would be using the size of the warehouse as the name, for example, XSMALL_wh. Such a name can be very confusing because the size of the warehouse is configurable and can be changed over time.
Setting Permissions on a Warehouse
Setting the right permissions on warehouses can help with centralized management and can provide better control and governance. Using permissions correctly can reduce the administration tasks.
Snowflake provides a set of permissions that can be assigned to roles to simplify warehouse management in a multi-warehouse environment:
- USAGE: Grants the role the ability to use the warehouse and to run commands using the warehouse.
- MONITOR: Grants the role the ability to see details within an object (for example, queries and usage within a warehouse). It is best practice to provide this ability to the role so the workload using the warehouse can monitor the activities in the warehouse and the role will be able to see the warehouse load over time using the Snowflake UI.
- OPERATE: Grants the role the ability to start, stop, suspend, or resume a virtual warehouse. This is a sensitive permission. It is highly recommended to grant this permission to ONLY the warehouse administration role.
Setting a Default Warehouse for users
To provide better control and governance of warehouse usage, a best practice is to define a default warehouse for each user. This can prevent users from using the wrong warehouse for their workloads. Here is an example:
create user looker password='abc123' default_role = myrole must_change_password = true default_warehouse = analytics_wh;
For more information, see these tips.
Monitoring Best Practices
- Now that you have virtual warehouse resources running on Snowflake, let’s explore the various ways to monitor and manage them.
- We provided 11 tips for optimizing your resources in our 2nd blog post.
- Let me show you three more ways to manage your resource consumption.
Queuing queries in a warehouse is the first indication that a warehouse is overutilized. In some cases, this can be due to a design decision and it is OK to have queued processes. However, for workloads with tight SLAs, for example, console queries or BI tools, it’s probably not desirable to have a long queue.
You can monitor queue size by using the ACCOUNT_USAGE schema’s warehouse_load_history view, or you can look at real-time queue size by running a command such as the following, which will show all the current warehouse parameters and the queue size:
SHOW WAREHOUSES LIKE '%TEST'
Snowflake also provides detailed graphs showing warehouse load over time. Our documentation has more details on how to use these charts.
Identifying Costly Queries
If you followed the best practices above and you still want to find the query on a single warehouse that consumes most of your credits, you can query account_usage to get more insight about costly queries or queries that require tuning. Below you can find a few examples for queries you can use:
Here’s an example that breaks down by day and username of long-running queries that run frequently:
SELECT Date_trunc('day', start_time), user_name, Substr(query_text, 1, 100) AS query, Count(*), Sum(total_elapsed_time), Max(query_id) FROM snowflake.account_usage.query_history WHERE warehouse_name = 'WAREHOUSE_NAME' AND Date_trunc('day', start_time) >= CURRENT_DATE() - 7 GROUP BY Date_trunc('day', start_time), user_name, Substr(query_text, 1, 100) ORDER BY Date_trunc('day', start_time) DESC, Sum(total_elapsed_time) DESC
Here’s an example to identify a query that consumed lots of credits:
SELECT Query_ID,USER_NAME,warehouse_name,warehouse_type,start_time,end_time ,ROUND(total_elapsed_sec,2) total_elapsed_sec ,ROUND(Credit_Unit *total_elapsed_sec / 60/60, 2) total_credit FROM ( select Query_ID,USER_NAME,warehouse_name ,warehouse_type ,start_time,end_time ,total_elapsed_time/1000 total_elapsed_sec ,CASE WHEN warehouse_size = 'X-Small' THEN 1 WHEN warehouse_size = 'Small' THEN 2 WHEN warehouse_size = 'Medium' THEN 4 WHEN warehouse_size = 'Large' THEN 8 WHEN warehouse_size = 'X-Large' THEN 16 WHEN warehouse_size = '2X-Large' THEN 32 WHEN warehouse_size = '3X-Large' THEN 64 WHEN warehouse_size = '4X-Large' THEN 128 ELSE 1 END Credit_Unit from SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY where ( error_message IS NULL or error_code = 604 ) and warehouse_size IS NOT NULL and START_TIME >= DATEADD(DAY, -30, CURRENT_TIMESTAMP() ) ) ORDER BY total_credit desc
Using the QUERY_TAG Session Parameter
QUERY_TAG is an optional string that can be used to tag queries and other SQL statements executed within a session. Using it is helpful when multiple applications are sharing the same user ID and warehouse. If you use it wisely and consistently, you can identify workload trends such as performance issues and cost issues. As a best practice, tag your session with a unique identifier for your workload.
Then, all the workload activities will be recorded with the tag you specified. This can help you identify issues with certain elements of the data pipeline.
Here’s an example of using this parameter:
ALTER SESSION SET QUERY_TAG='My workload tag'
As shown in the following figure, you can search for specific tags using the Snowflake History tab. This can help you identify the slowest part of your pipeline.
In the 4 years I’ve been managing Snowflake deployments, I’ve used these strategies to great success and have seen significant results, such as up to a ⅓ reduction in credit consumption and an improvement in user experience and better SLAs.
While any specific situation can differ, following the guidance above can result in better performance and response time for ad hoc queries, minimize queuing, and reduce costs. Snowflake provides lots of flexibility to control and manage performance and cost for many use cases such as data lakes, data warehouses, machine learning, and data science. Use this flexibility to match your needs and you’ll have a highly efficient and cost-effective solution for your organization.