Without any doubt, we’re at a tipping point today when it comes to data warehousing. Almost three decades of meticulously crafting data models to match business demand and technological capabilities have been challenged by 2 major waves of innovation:
- The Hadoop Era, when our industry learned how to break data processing free of the database server boundaries.
- The Cloud era, where our industry learned to prioritize end-user experience and cost management over resource planning.
These technological advances can have significant benefits for end-users depending on data warehouses to track and report on business processes. When combined, insights from both waves can be applied to one of the staples of data analytics, the data warehouse.
Many organizations started developing their data analytics programs well before the Hadoop era. Often business users in those organizations report the data warehouses to be slow, hard to maintain, and not able to keep up with the pace of business change.
A quick win in the data cloud
As business complexity grows, so does business feedback on the usability of the data warehouse. One of the quick wins is to propose a lift-and-shift (take on-prem developments and migrate them to the cloud without major changes) of the data warehouse situation as-is and benefit from public cloud capabilities.
The Data Cloud is a global network where thousands of organizations mobilize data with near-unlimited scale, concurrency, and performance. Snowflake’s platform is the engine that powers and provides access to the Data Cloud. So this platform has been our favorite target for data warehouse migrations over the past few years.
Even though we’re experienced with all 5 major cloud data warehouse technologies, Snowflake is our go-to for data warehouse migrations because of:
- Full standard SQL support (hello, data analyst!)
- Support for semi-structured data (hello, data lake!)
- Blazing fast performance and concurrent usage through autoscaling (hello, dashboard!)
- No restrictions on data modeling (hello, data architect!)
But enough about that, here’s how we typically migrate the data warehouse.
Plan and architect the journey
Your data warehouse will likely consist of a large number of interconnected tables; isolated in groups to support up the various phases in the data lifecycle.
Here, you can start to eliminate tables that have pure technical use, but not functional one. Save some time down the road and drop supporting tables such as aggregation tables as you scan through your database. When you have unlimited processing power, you shouldn’t precalculate after all.
Keep your data modeling techniques for now. Data vault? Perfect. Kimball? Will do. Inmon? OK! Super-subset, Anchor, or any other technique? That’s just fine. After all, our lift-and-shift strategy is helping us to lower TCO and improve user satisfaction of the data warehouse, and changing modeling techniques don’t have a direct impact.
Now is a good time to define your success criteria as well. How are you going to define progress? What is a good time to release the revamped data warehouse to the first group of analysts? This will become increasingly important, so take your time to define your strategy.
Migrating a data warehouse to the Data Cloud implies many other aspects, such as
- Possibly rethinking your staging strategy
- Rethinking your ETL strategy
- Orchestrating data flows
- Designing authorization and authentication
- Rethinking the way end-users interact with your data
Future blogs will cover those aspects in-depth.
Now that we know which tables to keep, we can start extracting them from the Oracle data warehouse, and migrate them to Snowflake. This can be a fairly straightforward phase, as Snowflake supports many of the table definition
Pay attention to optimizations such as indexes. Cloud doesn’t need those.
Likely your warehouse will be of size, so extracting – reviewing – applying libraries of SQL code won’t be a reasonable job for a human to do.
You have a few options here
- There are a few great schema conversion tools out there on the market. The one we particularly love happens to be one of the cheapest, and extends the schema conversion with data modeling capabilities.
- We built an in-house data warehouse migration tool to convert schemas between Snowflake and Oracle. Our is different because it supports migrating continuous change on the source system.
No matter what option you choose, you want to make this a bulk job. Snowflake makes it particularly easy for us to do this, as they support many of the exotic SQL dialects that databases such as Oracle maintain. Great!
Lifting-and-shifting views and materialized views
Neither views nor materialized views should pose any issue. After all, both are just sql-based constructs based on tables that you should have migrated without any issue by now.
Views and secure views can be migrated with mostly the exact same syntax as Oracle (or most of the other data warehouse technologies) applies.
One point of attention though, and that’s the upcoming item.
Dealing with In-database scripting
Stored procedures are chunks of scripting code, often developed in a proprietary programming language, meant to perform a sequence of tasks within the database itself. They’re loved by some, and frowned upon by others.
So here’s where likely a large part of the time will be spent: in re-coding PL/SQL to one of the supported scripting languages in Snowflake. Our advice? To analyze, eliminate, prioritize, and plan re-coding these stored procedures. It’s a good moment for some housekeeping in your database anyhow, many of these chunks of code may be redundant anyhow.
Be efficient and implement an automated data testing strategy to keep your developers focused on migrating, not fixing the past.
Moving data into the data cloud
Once the foundations are in place, it’s time to migrate the data over. Pick a strategy that will work for your lift-and-shift, but use the opportunity to think about how you’ll replenish your data cloud once your legacy data warehouse has been decommissioned. You’ll have the option to pick one of many strategies to help you automate this:
Use your current ETL tool to push data over to Snowflake. It comes with a word of caution as many ETL products don’t fully support cloud workloads yet.
Use a commercial, off-the-shelf data replication technology such as Attunity or Fivetran to cover the initial data migration. It’ll give you the benefit of having a commercial organization behind the technology, in case anything goes wrong.
Tap into organization-wide efforts to manage data at scale. Streaming architectures are becoming increasingly popular in some industries. Snowflake has a well-developed Kafka consumer connector available, which we have successfully deployed in the past. Getting data into Kafka is fairly straightforward by using an open-source framework called Debezium.
Migrating data workloads to the public cloud has a significant impact on the user experience for data products. Our developer productivity and tool stack satisfaction is considerably higher than the legacy ones.
Migrating an existing Oracle data warehouse might seem like a daunting task, but it shouldn’t be one. Many of the integration points of your current data warehouse can remain practically unchanged. After all, Snowflake supports the full SQL instruction set and it interprets much of the Oracle SQL dialect out-of-the-box.
From our experience, a data warehouse migration is not an end, but just the beginning of a wider data analytics program. One where more analysts can be enabled with less effort and more data products can be released in less time thanks to the automation features offered by the public cloud.
Snowflake delivers the Data Cloud — a global network where thousands of organizations mobilize data with near-unlimited scale, concurrency, and performance. Inside the Data Cloud, organizations unite their siloed data, easily discover and securely share governed data, and execute diverse analytic workloads. Wherever data or users live, Snowflake delivers a single and seamless experience across multiple public clouds. Join Snowflake customers, partners, and data providers already taking their businesses to new frontiers in the Data Cloud. Snowflake.com.