Processing unstructured data in Snowflake

Snowflake recently introduced support for unstructured data. Here's how we used this highly requested feature for the first time.

Sharing reports on Snowflake with your clients and colleagues can be a rewarding part of your project. It allows you to go over the important findings of your project and summarise the process. This report is often even better when it can be accompanied by clear images and visualisations. Thanks to a new feature in Snowflake, you can now share your images with you clients and colleagues the same way you would share data.

Structured vs unstructured data

Structured (i.e., CSV…) and semi-structured data (i.e., JSON, Avro, Parquet…) are already strongly supported in Snowflake. These can be easily imported into databases and tables to be presented and used for many transformations. However, it is a little more complicated for unstructured data. The best example for unstructured data would be images.

Introducing file support for unstructured data in Snowflake

So, imagine you have finished your reports and summaries of one of your projects in Snowflake. You have created multiple tables and views you would like to share with your colleagues or partners. For visualisation, you have also created some amazing plots and images you would also like to share, together with your table and views if possible.

Introducing file support for unstructured data. This new feature allows you to create pre-signed URLs of staged files. A pre-signed URL is a link that allows you to view your unstructured file (i.e., image) in your web browser by just pasting it in your search bar. So, now you can easily add your pre-signed URL to your report table and your colleague will be able to open the image without any hassle.

How does it work

To access your images they have to be staged in Snowflake. So, you start by creating a stage with server-side encryption only. This can be an internal or external stage (e.g., S3 bucket).

Creating a stage

An internal stage can easily be created by running the next code. Here, you just define the name of your stage (i.e., images_stage) and the type of encryption, which has to be server-side encryption.

create or replace stage images_stage
encryption = ( type = 'SNOWFLAKE_SSE' );

An external stage is a little more complicated since you have to either define your credentials, or you can use a storage integration for your stage. First we have an example that uses an existing storage integration for the stage.

create or replace stage my_ext_stage
url = 's3://load/files/'
storage_integration = myint;

You can also directly use your credentials to create a server-side encrypted stage. Here, you need to define your AWS S3 credentials and again your server-side encryption.

create or replace stage my_ext_stage
url = ' s3://load/files/ '
credentials = ( aws_key_id = '1a2b3c' aws_secret_key = '4x5y6z' )
encryption = ( type = 'AWS_SSE_S3' );

Accessing your stage

Now, once your stage is created, you can access your stage by running the next query. This returns a list of files that are present in your stage. For this query you just need the name of your stage.

list @s3_demodata_stage;

Getting the pre-signed URL

To get the pre-signed URL, we first need the relative file path. This can be accessed by running the following query using the stage name and the absolute file path. The absolute file path is the file path you got by running the ‘list‘ query in the previous step.

select get_relative_path(@s3_demodata_stage, 's3://862475166714-snowflake-demo-data/unstructured/img/jeremy-yap-jn-HaGWe4yw-unsplash.jpg');

This relative path can then be used to get the pre-signed URL by running the following query. Here you need to enter your stage name and the relative path. You also need to define an expiration time, which has to be a value equal to or greater than 900 seconds. This is the length of time in seconds you want your pre-signed URL to be active. Meaning, the URL will expire after the defined length of time and the image will not be accessible anymore.

Attention, it is important to remove the backslash at the beginning of the relative path before you paste this in the query for the pre-signed URL. Otherwise, the pre-signed URL will not be created correctly and it will not work.

select get_presigned_url(@s3_demodata_stage, 'unstructured/img/jeremy-yap-jn-HaGWe4yw-unsplash.jpg', 3600);

You have now created your pre-signed URL that you can send to colleagues or add to reports. This pre-signed URL will now allow you to view your image by pasting it in your web browser.

Create a view to store your image information

We already spoke about adding your images to your tables and reports. But, to present the pre-signed URL in a clear manner, especially when you have multiple images, you can also create a view with some important information. You can do this by creating a table with columns that are relevant for the image. Here we have the url of the file, the image format, the size, some tags that describe the image, the dominant color and the relative path. These columns can be whatever you deem important for your case.

  create table images_table
  (   file_url string,
      image_format string,
      size number,
      tags array,
      dominant_color string,
      relative_path string);

This table can then be filled manually by just adding the values. Or it can be filled by using a JSON file that contains the metadata extracted from your image. This extraction can be done using the API of your cloud storage.

Once your table is filled with the information you need, you can create a view that contains the columns from your table you want, and you can also add your pre-signed URL. This way you have a description of the image and your pre-signed URL in the same place. So you create a view by just selecting whatever you need from your table.

create view image_catalog as
(select
    size,
    get_presigned_url(@images_stage, relative_path) as presigned_url,
    tags
  from
    images_table);

You have now created a view that includes some important information about your image and your pre-signed URL. This view can now be shared or added to your report however you want.

Summary

So, to summarise, this new feature allows you to add images in Snowflake! It is very useful when you want to present or share your reports on Snowflake with the possibility of accompanying the reports with pictures. This allows for much better and more visual reports. You can just add them to the tables of your reports and share them like that. But you can also create specific views to have a clear overview of your images. There are many options!

Picture of Selin Kanatli

Selin Kanatli

Analytics Engineer

Related articles

One of the most considerable challenges for a data platform owner today is upgrading their data platform infrastructure. We found a way to automate the conversion of several legacy technologies to Snowflake by autoconverting them to dbt projects. Here’s how we did it.

How We Accelerate Hadoop-to-Snowflake Migrations

In just 6 weeks, Jacob had the opportunity to learn and grow through a series of courses designed to equip him with the skills and knowledge necessary to succeed in the data industry.

Revisiting my 6 weeks onboarding training

If you’re working in a hands-on data role using Snowflake, Databricks, or Bigquery, chances are you’ve encountered dbt as a companion technology. 🎉 On April 3rd, 2023, dbt Labs announced that Tropos.io became one of the 5 premier partners worldwide.

Exclusive! We Are Excited To Be A Dbt Premier Partner in 2023

Scroll to Top