Spark Streaming Pulsar receiver

The Spark Streaming recevier for Pulsar is a custom receiver that enables Apache Spark Streaming to receive data from Pulsar.

An application can receive data in Resilient Distributed Dataset (RDD) format via the Spark Streaming Pulsar receiver and can process it in a variety of ways.


To use the receiver, include a dependency for the pulsar-spark library in your Java configuration.


If you’re using Maven, add this to your pom.xml:

<!-- in your <properties> block -->

<!-- in your <dependencies> block -->


If you’re using Gradle, add this to your build.gradle file:

def pulsarVersion = "1.19.0-incubating"

dependencies {
    compile group: 'org.apache.pulsar', name: 'pulsar-spark', version: pulsarVersion


Pass an instance of SparkStreamingPulsarReceiver to the receiverStream method in JavaStreamingContext:

SparkConf conf = new SparkConf().setMaster("local[*]").setAppName("pulsar-spark");
JavaStreamingContext jssc = new JavaStreamingContext(conf, Durations.seconds(5));

ClientConfiguration clientConf = new ClientConfiguration();
ConsumerConfiguration consConf = new ConsumerConfiguration();
String url = "pulsar://localhost:6650/";
String topic = "persistent://sample/standalone/ns1/topic1";
String subs = "sub1";

JavaReceiverInputDStream<byte[]> msgs = jssc
        .receiverStream(new SparkStreamingPulsarReceiver(clientConf, consConf, url, topic, subs));


You can find a complete example here. In this example, the number of messages which contain the string “Pulsar” in received messages is counted.