Highlights

  • The data streaming market size is forecasted to have a Compound Annual growth rate (CAGR) of 26.5% during the five years of 2021 to 2026.
  • Almost 90% of participants think Kafka is mission-critical to their data infrastructure.

The request to process data on a real-time basis is increasing. Historically, enterprises implementing the streaming data paradigm were driven by use cases like application monitoring and log aggregation, and data transformation (ETL)

Enterprises such as Netflix were early to embrace the streaming data paradigm. At present, there are more drivers to growing implementation. In a 2019 survey by Lightbend, Streaming Data and the Future Tech Stack, new capabilities in Artificial Intelligence (AI) and Machine Learning (ML), integrating multiple data streams and analytics is starting to revive these past use cases.

The market size of streaming analytics is expected to scale from USD 15.4 billion in 2021 to USD 50.1 billion in 2026, at a Compound Annual Growth rate (CAGR) of 26.5% during the predicted duration as per markets and markets research.

Again, as in the past, there has been a sort of de facto standard for streaming data: Apache Kafka. Kafka and Confluent, the organization that commercializes it, are a continuous success story, with confluent confidentially filing for IPO in 2021.

In 2018, more than 90% of confluent survey participants considered Kafka mission-critical to their data infrastructure, and queries on Stack Overflow grew by over 50% throughout the same year. As successful as confluent might be and majorly implemented as Kafka may be, the fact remains: Kafka’s foundations were laid in 2008.

Multiple data streaming options, each with a particular focus and strategy, have emerged recently. Apache Pulsar is one such option. In 2021, Pulsar got listed as one of the top 5 Apache Software Foundation projects and overtook Apache Kafka in monthly active contributors.

The core developers of Apache Pulsar and Apache BookKeeper started a company named StreamNative. This company published a report comparing Apache Pulsar and Apache Kafka related to performance benchmarks. StreamNative provides a wholly managed Pulsar-as-a-service cloud and allows enterprises to “access data as real-time event streams.”

Comparison between Pulsar and Kafka:

StreamNative is not the first enterprise that evolved around Pulsar. Streamlio is just another organization started by the original Pulsar creators, which Splunk took over in 2019. Streamlio’s two founders, Sijie Guo and Matteo Merli, currently serve as StreamNative’s CEO and CTO.

As Chief Architect and head of cloud engineering, Addison Higham shared that the organization concentrates on bottom-up, community-driven ways and factors like technical development, documentation, and training. Pulsar is utilized at Tencent, Verizon, Intuit, and Flipkart, with the latter two also being Stream Native customers.

In 2021, StreamNative scaled exponentially well. It bagged USD 23.7 million in Series A funding and increased its workforce from 30 to more than 60 throughout North America, EMEA, and Asia. It witnessed six times more revenue growth and three times growth in the implementation; AWS marketplace integration became a catalyst for the growth, SQL support, and other updates. Its community increased two times, and Pulsar crossed the 10,000 stars mark on GitHub.

Higham also said that the question of how Pulsar compares to Kafka is something they get more often. The latest published Pulsar Vs. the Kafka benchmark was executed in 2020, and many things have evolved since then. Thus, StreamNatives’ engineering team conducted a benchmark study using the Linux Foundation Messaging Benchmark.

As per StreamNative’s benchmarks, Pulsar could accomplish 2.5 times the maximum throughput compared to Kafka. Pulsar delivers constant single-numeric publish latency that is 100 times lesser than Kafka at P99.99 (ms). Low publish latency is crucial because it allows systems to immediately send messages to a message bus.

With a previous read rate that is 1.5 times faster than Kafka, applications using Pulsar as their messaging solution can catch up after an unexpected interruption in half the time. The benchmark, like other benchmarks, particularly those coming from vendors, should be seen as indicative.

Additionally, StreamNative also noted that the study concentrates entirely on comparing technical performance. While crucial, that’s not all that matters in measuring the alternatives, as Higham also noted. Other third parties have also embarked on a Pulsar Vs. Kafka comparison.

Higham also stated that Pulsar and Kafka could act similarly in multiple situations. Management and developer experience is another aspect that differentiates Stream Native from Pulsar.

Pulsar’s architecture and positioning

Higham mentioned Pulsar’s legacy as a messaging-driven platform that later evolved to address streaming and events. This is reflected in Pulsar’s API, and Higham believes this streamlines the adoption process among developers. While the player is not directly compatible with Kafka, a feature named protocol Handler allows it to interoperate with other systems APIs, with a Kafka implementation featured evidently.

Higham stated that StreamNative has continuous interactions with the enterprises that have implemented Kafka and found that they have a huge sprawl of hundreds or even thousands of Kafka clusters, almost one per application, that results in not being very cost-efficient. Pulsar’s integrated multi-tenancy is developed to share extremely valuable workloads to scale securely, Higham added, while highlighting features like Geo-replication.

Pulsar also provides SQL access to streaming data through Trino and data transformation Pulsar functions in languages like Go, Java, and Python. Pulsar’s upgraded version is 2.9.1, but when version 2.8 was launched, the Pulsar team released a technical blog putting all the details about Pulsar’s architecture, and we refer interested readers here.

Streamline says that its protocol handler algorithm provides a clear migration path from Kafka and seamlessly integrates with other systems and protocols like RocketMQ, AMQP, and MQTT. Higham also mentioned that it’s coming soon to StreamNative Cloud, pointing out the support for Kafka API.

StreamNative Cloud is a StreamNative’s primary revenue generator. In addition to assisting both a managed cloud offering, StreamNative offers enhances Apache Pulsar for security and integration features, including with platforms like Flink, Spark, and Delta Lake.

Compared to Pulsar and other offerings in that sector like Apache Flink or Spark Streaming, Higham stated that Pulsar is not concentrating on designing something similar to one of those streaming compute engines.

Higham said they are concentrated on “a great integration story of building [the] best of breed connector that’s very flexible, ease of use and the simple 80% use cases of single message transformation.” Pulsar has more similarities with Redpanda, as they focus on solving some of those essential pain points. Still, a few of those pain points are not only in the implementation but also in the backend protocol, Higham claims.