Highlights:

  • The startup was launched in 2019 and has now garnered roughly USD 100 million from investors including Lightspeed, Kleiner Perkins, and Redpoint.
  • The company said that in its current version, the managed distributed database service uses Amazon Web Services (AWS) S3.

New York-based startup Materialize announced the early availability of its distributed streaming database that allows for immediate and widespread adoption of real-time data for applications, business functions, and other data products.

The business released the first version of its namesake software two years back as a single binary to input data from Kafka. It lets customers use standard SQL to query and connect streaming data.

The startup was launched in 2019 and has now garnered roughly USD 100 million from investors including Lightspeed, Kleiner Perkins, and Redpoint. It claims to have integrated a scalable storage layer into its software and offers it via a database-as-a-service (DBaaS) model. Only current users can access the updated program at this time; the corporation has not revealed when it will be made available to the public.

A distributed database is one that is deployed on various clusters across various data centers and still functions as a logical database.

The definition of a streaming database

According to Materialize, a streaming database collects data as it is being generated from several sources and then processes that data to respond to queries. According to Carl Olofson, IDC research vice president, Materialize simplifies the process by which enterprise users can link the database to a data stream or streams.

Olofson said, “Streaming database is a bit of a misnomer since the database itself doesn’t stream, but it executes quickly enough to be able to capture streaming data as it arrives.”

The development has come close at a time when organizations want to analyze more and more data to devise a strategy that can make them resilient in times when economic headwinds and geopolitical uncertainty have led to a spike in online analytical processing (OLAP) queries. OLAP queries are a feature that, according to the company’s database, supports at minimal cost than databases that provide batch processing systems.

Seth Wiesman, director of field engineering at Materialize, explained that the price drop was possible, all thanks to two computational frameworks within the database. They are DataFlow, a framework that manages and executes data-parallel dataflow computations and Differential DataFlow – another data-parallel programming framework built to efficiently process and respond to changes in massive volumes of data.

Batch processing has lower latency and costs

Typically, a batch processing system runs through all data that has been submitted into a system in order to provide an answer to a query. This is both computationally expensive and slows down the query process as well.

Materialize, on the other hand, uses computational frameworks to conduct a query (or “view” in database terminology), cache it in the form of Materialized Views, and detect any incremental change to the user’s dataset – rather than re-analyzing the complete data set – and update the query result, stated Wiesman.

The company maintains that when users build tables, sources, and materialized views and introduce data to them, the DBaaS version of Materialize would record and store data, making both snapshots and update streams instantaneously available to any computers subscribing to the service.

Wiesman said, “Enterprise users may either query the results for fast, high-concurrency reads or subscribe to changes for pure event-driven architectures.”

The company said that in its current version, the managed distributed database service uses Amazon Web Services (AWS) S3. It added that support for native object stores across major cloud providers would be available soon.

Support for PostgreSQL

Further, the business claims that the Materialize interface offers full ANSI SQL support and is compatible with PostgreSQL.

Materialize’s DBaaS comes with a dataflow engine that necessitates either zero or minimal functional programming, making it a significant improvement over a generic data system.

It noted that business users could model a SQL query as a dataflow that accepts a stream of captured change data, processes that data with a predetermined set of transformations, and then present the results.

Redis, the most popular data system for streaming data capture, puts a burden of programming on the enterprise user as it comes with no schema or query language, Olofson claimed.

“There are two products to look at as potential competitors: SingleStore (which is a memory-optimized for relational databases used for streaming data capture among other things) and CockroachDB,” Olofson said. He added that Hazelcast is another competitor because it uses an in-memory data sharing platform that has been adding querying capabilities to its feature list.

Materialize said it uses the Snowflake pricing model, wherein companies buy credits to pay for software based on how much they use it. Wiesman said that the price of credits is based on where users are located.