S01E04 Governing Data Transformation in Snowflake (with dbt)

Snowflake Ecosystem Podcast

S01E04 Governing Data Transformation in Snowflake (with dbt)

In episode 4, Hope Watson (dbt Labs) and Joris Van den Borre (Tropos.io) discuss governing data transformations in a modular, cloud-first data platform.

About this episode

If you’ve been planning to roll out Snowflake, you may have unnoticed that organizing data transformations look totally different compared to previous technology iterations. dbt seems to be the default choice when it comes to governing business rules in modern data platforms. Hope Watson (dbt Labs) and Joris Van den Borre (Tropos.io) spent half an hour together discussing the rise and ubiquity of the combo.

Key takeaways from the session

The Do’s

✅ Be smart in how you deliver. Get rid of overhead in your data integration practice by taking proven practices from software engineering and applying them to a data context. One of those is “continuous integration”, a practice we use often to watch over the quality of your deliverables so the pace of delivery can stay high. Writing code instead of using low code principles is – contrary enough – often a more reliable and productive way to speed up the time-to-market for new data products;

✅ Keep your ecosystem efficient. Data transformations are a “spikey” workload, so keep a fit-for-purpose focus for every component in your tech ecosystem. The ecosystem is rich and a smart mix-and-match between components keeps the total cost of ownership at bay whilst making the most use of your Snowflake budgets;

✅ From a process perspective, it makes sense to consolidate responsibilities to transform, test and document data. But no one likes to do that, right? And if it happens, it often happens at the very last moment or as part of a technical debt reduction effort. We’re now at a point where ideal responsibilities for an engineering team can be matched with a way of working such as dbt proposes.

The Don’t’s

❌ Don’t try and reinvent the wheel. Open source is great to experiment, innovate and validate use cases. However, when projects really become successful, it’s often the innovators who become the helpdesk. Make sure there’s a stable support model – hence company – behind the open source that made your project successful.

❌ Don’t underestimate SQL, the programming language for databases. Really, don’t. It might be hard to scale across teams, regions or projects, but templating engines such as dbt do a great job of managing bits and pieces of complex business logic. By sticking to SQL, teams can remove degrees of freedom that other programming languages offer but often aren’t strictly necessary to deliver business value. This reduces the complexity of managing your platform, and makes Snowflake a perfect outsourcing partner;

❌ Don’t build processes from scratch. None of them. Yes, it might be enticing to go full-blown from the start on managing your code, checking your quality, going to production, and scaling your platform, … But it has been done before and learnings are out there. Make sure you can just copy and paste the bare minimum, and preferably get some guardrails in place from the start;

Cookie	Duration	Description
__hssrc	session	This cookie is set by Hubspot whenever it changes the session cookie. The __hssrc cookie set to 1 indicates that the user has restarted the browser, and if the cookie does not exist, it is assumed to be a new session.
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
elementor	never	This cookie is used by the website's WordPress theme. It allows the website owner to implement or change the website's content in real-time.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
__hssc	30 minutes	HubSpot sets this cookie to keep track of sessions and to determine if HubSpot should increment the session number and timestamps in the __hstc cookie.
bcookie	2 years	LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognize browser ID.
bscookie	2 years	LinkedIn sets this cookie to store performed actions on the website.
lang	session	LinkedIn sets this cookie to remember a user's language setting.
lidc	1 day	LinkedIn sets the lidc cookie to facilitate data center selection.
UserMatchHistory	1 month	LinkedIn sets this cookie for LinkedIn Ads ID syncing.

Cookie	Duration	Description
__hstc	5 months 27 days	This is the main cookie set by Hubspot, for tracking visitors. It contains the domain, initial timestamp (first visit), last timestamp (last visit), current timestamp (this visit), and session number (increments for each subsequent session).
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_ZET6HEX39B	2 years	This cookie is installed by Google Analytics.
_gat_gtag_UA_75663021_2	1 minute	Set by Google to distinguish users.
_gat_UA-75663021-2	1 minute	A variation of the _gat cookie set by Google Analytics and Google Tag Manager to allow website owners to track visitor behaviour and measure site performance. The pattern element in the name contains the unique identity number of the account or website it relates to.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
hubspotutk	5 months 27 days	HubSpot sets this cookie to keep track of the visitors to the website. This cookie is passed to HubSpot on form submission and used when deduplicating contacts.
undefined	never	Wistia sets this cookie to collect data on visitor interaction with the website's video-content, to make the website's video-content more relevant for the visitor.

Cookie	Duration	Description
_fbp	3 months	This cookie is set by Facebook to display advertisements when either on Facebook or on a digital platform powered by Facebook advertising, after visiting the website.
fr	3 months	Facebook sets this cookie to show relevant advertisements to users by tracking user behaviour across the web, on sites that have Facebook pixel or Facebook social plugin.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
AnalyticsSyncHistory	1 month	No description
li_gc	2 years	No description
loglevel	never	No description available.

Snowflake Ecosystem Podcast