ZooKeeper and BookKeeper administration
Pulsar relies on two external systems for essential tasks:
- ZooKeeper is responsible for a wide variety of configuration-related and coordination-related tasks.
- BookKeeper is responsible for persistent storage of message data.
ZooKeeper and BookKeeper are both open-source Apache projects. This diagram illustrates the role of ZooKeeper and BookKeeper in a Pulsar cluster:
Each Pulsar cluster consists of one or more message brokers. Each broker relies on an ensemble of bookies.
ZooKeeper
Each Pulsar instance relies on two separate ZooKeeper quorums.
- Local ZooKeeper operates at the cluster level and provides cluster-specific configuration management and coordination. Each Pulsar cluster needs to have a dedicated ZooKeeper cluster.
- Configuration Store operates at the instance level and provides configuration management for the entire system (and thus across clusters). An independent cluster of machines or the same machines that local ZooKeeper uses can provide the configuration store quorum.
Deploy local ZooKeeper
ZooKeeper manages a variety of essential coordination-related and configuration-related tasks for Pulsar.
To deploy a Pulsar instance, you need to stand up one local ZooKeeper cluster per Pulsar cluster.
To begin, add all ZooKeeper servers to the quorum configuration specified in the conf/zookeeper.conf
file. Add a server.N
line for each node in the cluster to the configuration, where N
is the number of the ZooKeeper node. The following is an example of a three-node cluster:
server.1=zk1.us-west.example.com:2888:3888
server.2=zk2.us-west.example.com:2888:3888
server.3=zk3.us-west.example.com:2888:3888
On each host, you need to specify the node ID in myid
file of each node, which is in data/zookeeper
folder of each server by default (you can change the file location via the dataDir
parameter).
For detailed information on myid
and more, see the Multi-server setup guide in the ZooKeeper documentation.
On a ZooKeeper server at zk1.us-west.example.com
, for example, you can set the myid
value like this:
mkdir -p data/zookeeper
echo 1 > data/zookeeper/myid
On zk2.us-west.example.com
the command is echo 2 > data/zookeeper/myid
and so on.
Once you add each server to the zookeeper.conf
configuration and each server has the appropriate myid
entry, you can start ZooKeeper on all hosts (in the background, using nohup) with the pulsar-daemon
CLI tool:
bin/pulsar-daemon start zookeeper
Deploy configuration store
The ZooKeeper cluster configured and started up in the section above is a local ZooKeeper cluster that you can use to manage a single Pulsar cluster. In addition to a local cluster, however, a full Pulsar instance also requires a configuration store for handling some instance-level configuration and coordination tasks.
If you deploy a single-cluster instance, you do not need a separate cluster for the configuration store. If, however, you deploy a multi-cluster instance, you need to stand up a separate ZooKeeper cluster for configuration tasks.
Single-cluster Pulsar instance
If your Pulsar instance consists of just one cluster, then you can deploy a configuration store on the same machines as the local ZooKeeper quorum but run on different TCP ports.
To deploy a ZooKeeper configuration store in a single-cluster instance, add the same ZooKeeper servers that the local quorum uses to the configuration file in conf/global_zookeeper.conf
using the same method for local ZooKeeper, but make sure to use a different port (2181 is the default for ZooKeeper). The following is an example that uses port 2184 for a three-node ZooKeeper cluster:
clientPort=2184
server.1=zk1.us-west.example.com:2185:2186
server.2=zk2.us-west.example.com:2185:2186
server.3=zk3.us-west.example.com:2185:2186
As before, create the myid
files for each server on data/global-zookeeper/myid
.