Deploying a Pulsar cluster on AWS using Terraform and Ansible


For instructions on deploying a single Pulsar cluster manually rather than using Terraform and Ansible, see Deploying a Pulsar cluster on bare metal. For instructions on manually deploying a multi-cluster Pulsar instance, see Deploying a Pulsar instance on bare metal.

One of the easiest ways to get a Pulsar cluster running on Amazon Web Services (AWS) is to use the the Terraform infrastructure provisioning tool and the Ansible server automation tool. Terraform can create the resources necessary to run the Pulsar cluster—EC2 instances, networking and security infrastructure, etc.—while Ansible can install and run Pulsar on the provisioned resources.

Requirements and setup

In order install a Pulsar cluster on AWS using Terraform and Ansible, you’ll need:

You’ll also need to make sure that you’re currently logged into your AWS account via the aws tool:

$ aws configure

Installation

You can install Ansible on Linux or macOS using pip.

$ pip install ansible

You can install Terraform using the instructions here.

You’ll also need to have the Terraform and Ansible configurations for Pulsar locally on your machine. They’re contained in Pulsar’s GitHub repository, which you can fetch using Git:

$ git clone https://github.com/apache/incubator-pulsar
$ cd incubator-pulsar/deployment/terraform-ansible/aws

SSH setup

In order to create the necessary AWS resources using Terraform, you’ll need to create an SSH key. To create a private SSH key in ~/.ssh/id_rsa and a public key in ~/.ssh/id_rsa.pub:

$ ssh-keygen -t rsa

Do not enter a passphrase (hit Enter when prompted instead). To verify that a key has been created:

$ ls ~/.ssh
id_rsa               id_rsa.pub

Creating AWS resources using Terraform

To get started building AWS resources with Terraform, you’ll need to install all Terraform dependencies:

$ terraform init
# This will create a .terraform folder

Once you’ve done that, you can apply the default Terraform configuration:

$ terraform apply

You should then see this prompt:

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value:

Type yes and hit Enter. Applying the configuration could take several minutes. When it’s finished, you should see Apply complete! along with some other information, including the number of resources created.

Applying a non-default configuration

You can apply a non-default Terraform configuration by changing the values in the terraform.tfvars file. The following variables are available:

Variable name Description Default
public_key_path The path of the public key that you’ve generated. ~/.ssh/id_rsa.pub
region The AWS region in which the Pulsar cluster will run us-west-2
availability_zone The AWS availability zone in which the Pulsar cluster will run us-west-2a
aws_ami The Amazon Machine Image (AMI) that will be used by the cluster ami-9fa343e7
num_zookeeper_nodes The number of ZooKeeper nodes in the ZooKeeper cluster 3
num_pulsar_brokers The number of Pulsar brokers and BookKeeper bookies that will run in the cluster 3
base_cidr_block The root CIDR that will be used by network assets for the cluster 10.0.0.0/16
instance_types The EC2 instance types to be used. This variable is a map with two keys: zookeeper for the ZooKeeper instances and pulsar for the Pulsar brokers and BookKeeper bookies t2.small (ZooKeeper) and i3.xlarge (Pulsar/BookKeeper)

What is installed

When you run the Ansible playbook, the following AWS resources will be used:

All EC2 instances for the cluster will run in the us-west-2 region.

Fetching your Pulsar connection URL

When you apply the Terraform configuration by running terraform apply, Terraform will output a value for the pulsar_service_url. It should look something like this:

pulsar://pulsar-elb-1800761694.us-west-2.elb.amazonaws.com:6650

You can fetch that value at any time by running terraform output pulsar_service_url or parsing the terraform.tstate file (which is JSON, even though the filename doesn’t reflect that):

$ cat terraform.tfstate | jq .modules[0].outputs.pulsar_service_url.value

Destroying your cluster

At any point, you can destroy all AWS resources associated with your cluster using Terraform’s destroy command:

$ terraform destroy

Running the Pulsar playbook

Once you’ve created the necessary AWS resources using Terraform, you can install and run Pulsar on the Terraform-created EC2 instances using Ansible. To do so, use this command:

$ ansible-playbook \
  --user='ec2-user' \
  --inventory=`which terraform-inventory` \
  ../deploy-pulsar.yaml

If you’ve created a private SSH key at a location different from ~/.ssh/id_rsa, you can specify the different location using the --private-key flag:

$ ansible-playbook \
  --user='ec2-user' \
  --inventory=`which terraform-inventory` \
  --private-key="~/.ssh/some-non-default-key" \
  ../deploy-pulsar.yaml

Accessing the cluster

You can now access your running Pulsar using the unique Pulsar connection URL for your cluster, which you can obtain using the instructions above.

For a quick demonstration of accessing the cluster, we can use the Python client for Pulsar and the Python shell. First, install the Pulsar Python module using pip:

$ pip install pulsar-client

Now, open up the Python shell using the python command:

$ python

Once in the shell, run the following:

>>> import pulsar
>>> client = pulsar.Client('pulsar://pulsar-elb-1800761694.us-west-2.elb.amazonaws.com:6650')
# Make sure to use your connection URL
>>> producer = client.create_producer('persistent://sample/local/ns1/test-topic')
>>> producer.send('Hello world')
>>> client.close()

If all of these commands are successful, your cluster can now be used by Pulsar clients!