Cloud-Native, Distributed Messaging and Streaming
Apache® Pulsar⢠is an open-source, distributed messaging and streaming platform built for the cloud.
What is Apache Pulsar?
Apache Pulsar is an all-in-one messaging and streaming platform. Messages can be consumed and acknowledged individually or consumed as streams with less than 5ms of latency. Its layered architecture allows rapid scaling across hundreds of nodes, without data reshuffling. Its features include multi-tenancy with resource separation and access control, geo-replication across regions, tiered storage and support for five official client languages. It supports up to one million unique topics and is designed to simplify your application architecture. Pulsar is a Top 10 Apache Software Foundation project and has a vibrant and passionate community and user base spanning small companies and large enterprises.
Pulsar Features
Automatic Load Balancing
Add or remove nodes and let Pulsar load balance topic bundles automatically. Hot spotted topic bundles are automatically split and evenly distributed across the brokers.
Serverless functions
Write and deploy functions natively using Pulsar Functions. Process messages using Java, Go, or Python without deploying fully-fledged applications. Kubernetes runtime is bundled.
Rapid Horizontal Scalability
Scales horizontally to handle the increased load. Its unique design and separate storage layer enable handling the sudden surge in traffic by scaling out in seconds.
Durable Low-latency Messaging and Streaming
Acknowledge messages individually (RabbitMQ style) or cumulative per partition (i.e., offset-like). Enables use cases such as distributed work queues or order-preserving data streams at very large scales (hundreds of nodes) and low latency (<5ms). Message durability is achieved using Bookkeeper as a storage layer and cloud-based tiered storage.
Seamless Geo-Replication
Protect against complete zone outages using replication across different geographic regions. Flexible and configurable replication strategies across distant Pulsar Clusters. Uniquely supports automatic client failover to healthy clusters.
Multi-tenancy
Maintain one cluster for your entire organization using tenants. Access control across data and actions using tenant policies. Isolate specific brokers to a tenant when maximum noisy neighbor protection is needed.
Official multi-language support
Officially maintained Pulsar Clients for Java, Go, Python, C++, Node.js, and C#.
Official 3rd party integrations
Pulsar has officially maintained connectors with popular 3rd parties: MySQL, Elasticsearch, Cassandra, and more. Allows streaming data in (source) or out (sink).
Supports up to 1M topics
REST Admin API for provisioning, administration, tools and monitoring. Can be deployed on bare metal, Kubernetes, Amazon Web Services(AWS), and DataCenter Operating System(DC/OS).
Pulsar Users
Run in production at scale with millions of messages per second across millions of topics, Pulsar is now used by thousands of companies for real-time workloads.
Among the features we considered were tiered storage, as we planned to have unlimited retention (for event sourcing that matters a lot), flexible subscription model (we use exclusive at the moment, however we want to try per-key subscription), authorization via different methods including certificates and JWT (JSON Web Token), and an easy way to get it up and running.
Kirill Merkushev | Vivy