It is now Friday morning, officially the end of Snowflake Summit 2019. Though the event has come and gone in its physical form, the excitement and optimism we feel as a company remains strong. This optimism is built upon the trust customers have expressed in us based on the results they’ve achieved. We couldn’t be more excited for what lies ahead. But first, let’s take a quick review of this week’s product announcements.
At Snowflake, we are committed to putting customers first. During his keynote, Christian Kleinerman, Vice President of Products, mentioned the term “relentless innovation.” This is core to what Snowflake stands for as we are constantly evolving and refining the product to help organizations solve their most urgent business and technology challenges. Our mission is to enable organizations to become more data-driven, using all of their data to draw out deep insights to make better, faster business decisions.
Below is a list of what products are available now and what is currently in preview. There are 4 main themes: Global Snowflake, Core Data Warehouse, Data Pipelines, and Data Exchange.
Microsoft Azure Government
Microsoft Azure Government delivers a dedicated cloud, restricted to U.S. government agencies and their partners, and operated by screened U.S. citizens. Now, with Snowflake’s availability on Microsoft Azure Government, federal customers can enjoy the benefits of the Snowflake data warehouse as a service on a dedicated instance of Azure that is restricted to U.S. government agencies and their partners. For more information, click here.
Snowflake now offers a new, modern approach to Materialized Views (MV) capabilities through a version that addresses the pain points of traditional approaches. Snowflake MVs:
- Ensure optimal speed (no slowdowns)
- Deliver query results through MVs that are always current and consistent with the main data table
- Provide exceptional ease-of-use through a maintenance service that continuously runs and updates MVs in the background. For more information, please read here.
We are enhancing the product by focusing on performance, security, and breadth of new capabilities so that Snowflake can be used by customers for an even broader set of data warehouse workloads. Connect BY and recursive CTE are hierarchical SQL query syntaxes offered by on premises data warehouse solutions, and are table stakes for large enterprises. These features further solidify our position as the Enterprise Data Warehouse built for the cloud, and they make migration from legacy on premises data warehouse products a possibility.
Previews and Pre-announcements
Google Cloud Platform
We announced a strategic partnership with Google Cloud Platform that will enable customers to use Snowflake alongside Google Cloud’s comprehensive set of advanced analytics and machine learning solutions to derive meaningful insights from various data sources. Snowflake on Google Cloud is set to launch in preview in Fall 2019, with general availability scheduled for early 2020.
Snowflake Database Replication
Global Snowflake is a core theme behind Snowflake’s product strategy for becoming our customers’ global cloud data solution across regions and cloud providers. Snowflake Database Replication enables customers to replicate databases and keep them synchronized across multiple accounts in different regions and/or cloud providers. Changes can be synchronized to a different region or cloud provider, ensuring data durability and availability at all times.
Snowflake Database Replication and Failover occurs in real time and recovery time does not depend on data size. For more information, click here.
In February 2019 Snowflake acquired Numeracy, a company that built a unique and compelling SQL query editor. The product has support for additional functionality such as SQL autocomplete, query and worksheet sharing, in-worksheet visualizations, and rapid catalog browsing and search.
We will be bringing these features to a new version of Worksheets for all Snowflake customers.
Okta Provisioning for Snowflake allows customers to automatically externalize user and role management through Okta and Active Directory (AD). For example, when a user is terminated they are automatically deactivated in Snowflake. Conversely, when a user is added to an AD group, they are automatically granted a role in Snowflake. This is important for two main reasons:
- Customers using Okta to manage users and groups in various SaaS applications (such as Salesforce, Slack, Dropbox, and others) expect the same experience from Snowflake.
- Customers want to manage Snowflake users and roles through Active Directory (AD). Through this integration, customers can use Okta as an intermediary tool to read users and groups from AD then provision them into Snowflake.
Data Pipelines: Auto-Ingest
AWS and Azure provide notification mechanisms to notify users whenever an object is created. Auto-Ingest uses these mechanisms by layering them over the ingest service so that the ingest service can automatically detect and retrieve files created under a stage and ingest them into their appropriate tables. This is important because it reduces latency for queries by ingesting and transforming data as it arrives. Read more here and here.
Streams and Tasks
The Streams and Tasks feature is fundamental to building end-to-end data pipelines and orchestration in Snowflake. While customers can use Snowpipe or their ELT provider of choice, that approach is limited to just loading data into Snowflake. Streams and Tasks aims to provide a task scheduling mechanism so customers no longer have to resort to external jobs for their most common scheduling needs for Snowflake SQL jobs. The feature also enables customers to connect their staging tables and downstream target tables with regularly processed logic that picks up new data from the staging table and transforms it into the shape required for the target table.
Snowflake Connector for Kafka
Apache Kafka is a platform for building pipelines to handle continuous streams of records. This connector makes it fast and easy to reliably publish these records to your Snowflake instance for storage and analysis. Learn more about data pipelines here.
External tables reference data files in a cloud storage (for example, AWS S3, Google Cloud Storage, or Microsoft Azure) data lake. External tables store file-level metadata about the data files such as the file path, a version identifier, and partitioning information. This enables querying data stored in files in a data lake as if it were located inside a database.
Hive Metastore integration
With Hive metastore integration, customers can now integrate a Hive metastore with Snowflake using external tables. The Hive connector in Snowflake listens to metastore events and transmits them to Snowflake to keep the external tables synchronized with the Hive metastore. This allows users to manage their tables in Hive while querying them from Snowflake.
Credential-less external stages
Credential-less external stages provide an option where customers do not have to pass secret keys or access tokens for storage accounts. They can be created on cloud storage accounts from GCP, Azure, and AWS clouds. Additionally, admins of customer accounts can restrict the usage of external stages for certain cloud storage locations, thus preventing data exfiltration.
The Snowflake Data Exchange is currently available in private preview, with public preview set to launch later this year.
The Snowflake Data Exchange is a free-to-join marketplace that enables Snowflake users to connect with data providers to seamlessly discover, access, and generate insights from each other’s data. Unlike traditional data transfer done through APIs or by extracting data to cloud storage, the Snowflake Data Exchange improves the control and security of exchanging data.
Snowflake customers will be able to easily access the Data Exchange from their Snowflake account and search a data catalog to discover and securely access real-time data that they can join with their existing data sets in Snowflake. Customers of the Data Exchange will not incur any data storage fees, as the data remains securely stored in the provider’s Snowflake account.
Data providers can share live, public, or private data sets in a fully-governed way and promote their data services to 1,500+ Snowflake customers to create new revenue streams. Providers also get insights into the types of data being accessed and used by their consumers.
We are already working on making the next Summit even better. See you in Las Vegas at Snowflake Summit in 2020!
The post Relentless Innovation: Product Round-up for Snowflake Summit 2019 appeared first on Snowflake.