We are glad to present the new 2.1.0-incubating release of Pulsar. This release is the culmination of 2 months of work that have brought multiple new features and improvements to Pulsar.
In Pulsar 2.1 you'll see:
- Pulsar IO connector framework and a list of builtin connectors
- PIP-17: Tiered Storage
- Pulsar Stateful Functions
- Go Client
- Avro and Protobuf Schemas
We'll provide a brief summary of these features in the section below.
Since Pulsar 2.0, we introduced a serverless inspired lightweight computing framework Pulsar Functions, providing the easiest possible way to implement application-specific in-stream processing logic of any complexity. A lot of developers love Pulsar Functions because they require minimal boilerplate and are easy to reason about.
In Pulsar 2.1, we continued following this "simplicity first" principle on developing Pulsar. We developed this IO (input/output) connector framework on top of Pulsar Functions, to simplify getting data in and out of Apache Pulsar. You don't need to write any single line of code. All you need is prepare a configuration file of the system your want to connect to, and use Pulsar admin CLI to submit a connector to Pulsar. Pulsar will take care of all the other stuffs, such as fault-tolerance, rebalancing and etc.
There are 6 built-in connectors released in 2.1 release. They are:
- Aerospike Connector
- Cassandra Connector
- Kafka Connector
- Kinesis Connector
- RabbitMQ Connector
- Twitter Firehose Connector
More connectors will be coming in future releases. If you are interested in contributing a connector to Pulsar, checkout the guide on Developing Connectors. It is as simple as writing a Pulsar function.
One of the advantages of Apache Pulsar is its segment storage using Apache BookKeeper. You can store a topic backlog as large as you want. When the cluster starts to run out of space, you just add another storage node, and the system will automatically pickup the new storage nodes and start using them without rebalancing partitions. However, this can start to get expensive after a while.
Pulsar mitigates this cost/size trade-off by providing Tiered Storage. Tiered Storage turns your Pulsar topics into real infinite streams, by offloading older segments into a long term storage, such as AWS S3, GCS and HDFS, which is designed for storing cold data. To the end user, there is no perceivable difference between consuming streams whose data is stored in BookKeeper or in long term storage. All the underlying offloading mechanisms and metadata management are transparent to applications.
Currently S3 is supported in 2.1. More offloaders (such as Google GCS, Azure Blobstore, and HDFS) are coming in future releases.
If you are interested in this feature, you can checkout more details here.
The greatest challenge that stream processing engines face is managing state. So does Pulsar Functions. As the goal for Pulsar Functions is to simplify developing stream native processing logic, we also want to provide an easier way for Pulsar Functions to manage their state. We introduced a set of State API for Pulsar Functions to store their state. It integrates with the table service in Apache BookKeeper for storing the state.
It is released as a developer preview feature in Pulsar Functions Java SDK. We would like to collect feedback to improve it in future releases.
Pulsar 2.0 introduces native support for schemas in Pulsar. It means you can declare how message data looks and have Pulsar enforce that
producers can only publish valid data on the topics. In 2.0, Pulsar only supports
JSON schemas. We introduced the
support for Avro and Protobuf in this release.
Follow the instructions to try it out in your Go applications!