Case Study

Help our government act right during Covid 19 through trusted data

Virologists, government and law enforcement coordinate the Covid-19 crisis based on trusted data.

Challenge

Consistent communication of critical numbers

Data observability, the operational aspect of running qualitative data pipelines, is of growing importance to our client. As a key player in the healthcare policy system, they play an important role in data gathering, processing, and distribution of data describing the state of the Covid-19 propagation.

With several hundreds of data suppliers providing updates at least on a daily basis, our client faced increasingly complex data pipeline quality challenges.

To make sure all internal and external clients trust the validity of frequently updated data products, they had to consistently track key statistics for each individual dataset:

Data consistency for events sourced from multiple systems;
Data freshness for data providers who are in continuous evolution;
Anomaly detection in time series of reported incidents;

Multiplied by the number of datasets, and the hourly refresh rate, this was a daunting task. A structured approach to data observability was inevitable.

Approach

Orchestrate automated data quality checks.

We wanted to make sure the data engineering team would never be the bottleneck in detecting data quality issues. As such, we wanted to

Enable business stakeholders to verify the overall as well as the individual status of their datasets;
Ensure we had a proactive alerting system in place for quality issues; thus relieving data analysts from the job of continuously validating status;

Furthermore, by making data quality monitoring a business process, the data engineering team was relieved from a manual, repetitive task.

In the original setting, the project had typical data tests. Most were developed using common SQL code; with a reactive management process. That is, analysts frequently had to execute SQL code and alert stakeholders when known issues were detected.

Together with our consultants, the team

Defined, developed, and operationalized custom metrics;
Deployed the automated scanning tool in a production data pipeline;
Setup the cloud environment on AWS to execute the scanning tasks;

Results

On-call services dropped by 100%, data has validity stamps before it is analyzed.

A Soda Data project was deployed in less than 2 weeks time.

Business stakeholders get access to data stream quality dashboards. They are able to define their own custom monitors, and could opt-in to proactive alerts.

By enabling a new monitoring process, we have been able to shift from a “do I trust this number?” mindset into an “I trust this number, unless..” mindset.

Cookie	Duration	Description
__hssrc	session	This cookie is set by Hubspot whenever it changes the session cookie. The __hssrc cookie set to 1 indicates that the user has restarted the browser, and if the cookie does not exist, it is assumed to be a new session.
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
elementor	never	This cookie is used by the website's WordPress theme. It allows the website owner to implement or change the website's content in real-time.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
__hssc	30 minutes	HubSpot sets this cookie to keep track of sessions and to determine if HubSpot should increment the session number and timestamps in the __hstc cookie.
bcookie	2 years	LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognize browser ID.
bscookie	2 years	LinkedIn sets this cookie to store performed actions on the website.
lang	session	LinkedIn sets this cookie to remember a user's language setting.
lidc	1 day	LinkedIn sets the lidc cookie to facilitate data center selection.
UserMatchHistory	1 month	LinkedIn sets this cookie for LinkedIn Ads ID syncing.

Cookie	Duration	Description
__hstc	5 months 27 days	This is the main cookie set by Hubspot, for tracking visitors. It contains the domain, initial timestamp (first visit), last timestamp (last visit), current timestamp (this visit), and session number (increments for each subsequent session).
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_ZET6HEX39B	2 years	This cookie is installed by Google Analytics.
_gat_gtag_UA_75663021_2	1 minute	Set by Google to distinguish users.
_gat_UA-75663021-2	1 minute	A variation of the _gat cookie set by Google Analytics and Google Tag Manager to allow website owners to track visitor behaviour and measure site performance. The pattern element in the name contains the unique identity number of the account or website it relates to.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
hubspotutk	5 months 27 days	HubSpot sets this cookie to keep track of the visitors to the website. This cookie is passed to HubSpot on form submission and used when deduplicating contacts.
undefined	never	Wistia sets this cookie to collect data on visitor interaction with the website's video-content, to make the website's video-content more relevant for the visitor.

Cookie	Duration	Description
_fbp	3 months	This cookie is set by Facebook to display advertisements when either on Facebook or on a digital platform powered by Facebook advertising, after visiting the website.
fr	3 months	Facebook sets this cookie to show relevant advertisements to users by tracking user behaviour across the web, on sites that have Facebook pixel or Facebook social plugin.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
AnalyticsSyncHistory	1 month	No description
li_gc	2 years	No description
loglevel	never	No description available.

Case Study

Help our government act right during Covid 19 through trusted data

Challenge

Consistent communication of critical numbers

Approach

Orchestrate automated data quality checks.

Results

On-call services dropped by 100%, data has validity stamps before it is analyzed.

Let's connect!

Industries

Community

Company

Careers

Contact