Snowflake Cost Optimization: Rightsizing Your Warehouses

Snowflake cost optimization demystified! A quick and easy method to gain more value out of every Snowflake dollar spent.
Celine Meulemans

In this article we’ll discuss a simple technique to drastically reduce the credit spend of a growing Snowflake project.

This is the first article in a series of posts dealing with cost optimization in Snowflake, the data cloud.

One of the most important differentiators in the platform is the ability to size processing clusters with just a click of the mouse. Often, new and starting teams start by building a one-node, extra-small-sized processing cluster and process any kind of data workload on that node.

After a while, typically both long and short-running queries will execute on that very same warehouse. Now take in mind the cost model. Snowflake bills by the second, though anytime a warehouse starts, it bills a minimum of one minute.

Let’s try to find a way to make sure that our queries run for a maximum of one minute.

Calculating the query length long tail

To have a first understanding of the query duration distribution per warehouse, we need to get to the data. In our demo setup, we have a warehouse in a place called LOAD. It is optimistically scaled as a medium warehouse and it consumes 4 credits an hour.

Historically, all of our workloads have been running on this LOAD warehouse, and our assumption is that we can split data loading workloads from transformation and analysis workloads. Let’s investigate this assumption and optimize our setup.

Snowflake exposes a number of statistics related to the inner workings of this warehouse through the shared SNOWFLAKE database. Find this information in the QUERY HISTORY table in the ACCOUNT USAGE schema.

To quickly analyze the information in this table, we’ll use the Snowsight interface for Snowflake. It comes by default with any Snowflake deployment but is hidden in plain sight under the preview app button

This analyst interface to Snowflake allows us to execute queries and visualize the results right away.

This query will build buckets of a with of 1 second which we can use to build our longtail histogram

SELECT
FLOOR(EXECUTION_TIME/1000.00)*1000 AS BIN_FLOOR,
COUNT(EXECUTION_TIME) AS COUNT
FROM “SNOWFLAKE”.”ACCOUNT_USAGE”.”QUERY_HISTORY”
WHERE WAREHOUSE_NAME = ‘LOAD’
GROUP BY 1;

Visualize the results immediately by selecting the Chart button in the middle of the screen

By default, Snowsight will use an extra layer of aggregation in their graphs that will distort the output of our analysis here. So let’s make sure to set that to none.

Finally, we get to the histogram that shows us how long queries take. The gray bar is the cutoff line for the one-minute minimum charge.

It turns out that the majority of queries run blazing fast (thanks, Snowflake!) and we have considerable opportunities to optimize

The action

Intuitively we have 2 actions to take:

Downscale the current instance to an XS. It’ll about quadruple the processing time of the fastest queries but that will still be within the minimum charge of one minute
Shift the workloads that require faster processing to a new warehouse

Action one is fairly straightforward and can be done using one simple query

ALTER WAREHOUSE “LOAD” SET WAREHOUSE_SIZE = ‘XSMALL’ AUTO_SUSPEND = 60;

Action two takes a bit more planning.

We start by isolating the queries that require more processing power. This particular Snowflake instance doesn’t apply query tags, so we need to investigate the actual queries that take over half a minute to process.

SELECT QUERY_TEXT, EXECUTION_TIME FROM “SNOWFLAKE”.”ACCOUNT_USAGE”.”QUERY_HISTORY”
WHERE EXECUTION_TIME > 30000
ORDER BY EXECUTION_TIME DESC;

Now, we need a warehouse to host those queries. Let’s create a new medium warehouse named NEW_WAREHOUSE by running this query:

CREATE WAREHOUSE NEW_WAREHOUSE WITH WAREHOUSE_SIZE = ‘MEDIUM’ WAREHOUSE_TYPE = ‘STANDARD’ AUTO_SUSPEND = 300 AUTO_RESUME = TRUE;

Analyze the queries that are reported back. Change the clients that execute those queries, and point those clients to the new warehouse we’ve just created.

Conclusion

Far fewer queries will use the new warehouse so it’ll be started less. The queries executed on the original warehouse will take a slightly bit more time to execute, but they will stay well within the one-minute minimum Snowflake charge.

The method proposed in this article is quite a simplification of reality, though often with a material impact on cost. Multiple queries can run at the same time, and a query run on a warehouse that is already active doesn’t call for another one-minute minimum charge. A more advanced method of cost management would take more parameters into account, such as parallel processing capabilities and warehouse sizes.

Celine Meulemans

In just 6 weeks, Jacob had the opportunity to learn and grow through a series of courses designed to equip him with the skills and knowledge necessary to succeed in the data industry.

Revisiting my 6 weeks onboarding training

If you’re working in a hands-on data role using Snowflake, Databricks, or Bigquery, chances are you’ve encountered dbt as a companion technology. 🎉 On April 3rd, 2023, dbt Labs announced that Tropos.io became one of the 5 premier partners worldwide.

Exclusive! We Are Excited To Be A Dbt Premier Partner in 2023

The how-to guide to interpreting Snowflake's usage-based pricing model.

Cookie	Duration	Description
__hssrc	session	This cookie is set by Hubspot whenever it changes the session cookie. The __hssrc cookie set to 1 indicates that the user has restarted the browser, and if the cookie does not exist, it is assumed to be a new session.
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
elementor	never	This cookie is used by the website's WordPress theme. It allows the website owner to implement or change the website's content in real-time.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
__hssc	30 minutes	HubSpot sets this cookie to keep track of sessions and to determine if HubSpot should increment the session number and timestamps in the __hstc cookie.
bcookie	2 years	LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognize browser ID.
bscookie	2 years	LinkedIn sets this cookie to store performed actions on the website.
lang	session	LinkedIn sets this cookie to remember a user's language setting.
lidc	1 day	LinkedIn sets the lidc cookie to facilitate data center selection.
UserMatchHistory	1 month	LinkedIn sets this cookie for LinkedIn Ads ID syncing.

Cookie	Duration	Description
__hstc	5 months 27 days	This is the main cookie set by Hubspot, for tracking visitors. It contains the domain, initial timestamp (first visit), last timestamp (last visit), current timestamp (this visit), and session number (increments for each subsequent session).
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_ZET6HEX39B	2 years	This cookie is installed by Google Analytics.
_gat_gtag_UA_75663021_2	1 minute	Set by Google to distinguish users.
_gat_UA-75663021-2	1 minute	A variation of the _gat cookie set by Google Analytics and Google Tag Manager to allow website owners to track visitor behaviour and measure site performance. The pattern element in the name contains the unique identity number of the account or website it relates to.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
hubspotutk	5 months 27 days	HubSpot sets this cookie to keep track of the visitors to the website. This cookie is passed to HubSpot on form submission and used when deduplicating contacts.
undefined	never	Wistia sets this cookie to collect data on visitor interaction with the website's video-content, to make the website's video-content more relevant for the visitor.

Cookie	Duration	Description
_fbp	3 months	This cookie is set by Facebook to display advertisements when either on Facebook or on a digital platform powered by Facebook advertising, after visiting the website.
fr	3 months	Facebook sets this cookie to show relevant advertisements to users by tracking user behaviour across the web, on sites that have Facebook pixel or Facebook social plugin.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
AnalyticsSyncHistory	1 month	No description
li_gc	2 years	No description
loglevel	never	No description available.