Thursday, May 21, 2020

Ways to change property values for Java Spring application in EKS (Kubernetes in AWS)


Issue to solve

When deploying micro services to EKS, we might need to pass different values to the same micro service for different environments.  For example, we might need to use different MSK instances or different encryption keys.  The traditional way (i.e. using application.properties file or yaml file) will not work since the values in these files will be set before docker image is built. Once micro service is deployed to cloud as container, you will have to login into the container to change values in these files and may have to restart your service to take the new value. 

When micro service is deployed through terraform code, we have to pass these values through terraform. 


Solution for passing values from terraform to container

One way to pass property value from terraform to micro service is to use following 'args' in kubernetes_deployment (spec/template/spec/container/args):

 args = [
  "--kafka.bootstrapServers =${var.msk_bootstrap_brokers_tls}",
  "--kafka.zooKeeper_hosts=${var.msk_zookeeper_hosts}",
  "--kafka.topic=mytopic",
  "--kafka.use_ssl=true",
  "--aws_encyption_key=${var.aws_encyption_key}"
]

Another way to do so is to use 'env' in the kubernetes_deployment too (spec/template/spec/container/env).

env {
  name = "kafka.hosts"
  value = ${var.msk_bootstrap_brokers_tls}"
}

env {
  name = "POD_IP"
  value_from  {
    field_ref  {
      field_path = "status.podIP"
    }
  }
}

This will pass these property values through container ENTRYPOINT or environment variables.


Solution for Java Spring code to get and use these values

One way to do so is to use @Value annotation.

@Configuration.  // or @Component
@Slf4j
public class myConfig {

    public myConfig(@Value("${NODE_IP:localhost}") String statsdNodeIP, // default value "localhost"
                    @Value("${kafka.bootstrapServers}") String kafkaServers) {
        // use these values here such as set value to class variable _xxxx;
    }
    @Bean
    public XXXX getXXXX() {
        return  _xxxx;
    }
}


Order of property value overwrite
  1. Command line arguments.
  2. Java System properties (System.getProperties()).
  3. OS environment variables.
  4. @PropertySource annotations on your @Configuration classes.
  5. Application properties outside of your packaged jar (application.properties including YAML and profile variants).
  6. Application properties packaged inside your jar (application.properties including YAML and profile variants).
  7. Default properties (specified using SpringApplication.setDefaultProperties).

Reference

https://docs.spring.io/spring-boot/docs/1.0.1.RELEASE/reference/html/boot-features-external-config.html

Saturday, May 2, 2020

Pulsar



Advantage over Kafka

Segment centric storage

When both Kafka and Pulsar have segment, Kafka has to put all segments for a topic into the same broker when Pulsar can spread segments into different bookie.

Total number of topics

Millions in Pulsar vs. thousands in Kafka.  Messages from different topics are aggregated, sorted, and stored in large files and then indexed in Apache Pulsar. This approach limits to proliferation of small files that leads to performance problems as the number of topics increases.

Unbounded topic partition storage

Unbounded topic partition storage in Bookies.  The capacity of Kafka topic partition is limited to the capacity of the smallest node.


Installation/Deployment:

·      The instruction to deploy it to k8s with helm (TLS disabled):


Make sure KUBECONFIG env variable is set.
kubectl create ns pulsar

git clone https://github.com/apache/pulsar
cd deployment/kubernetes/helm
helm upgrade --install pulsar pulsar


·      The instruction to deploy it to k8s without helm (I did not use this way):


Deploy ZooKeeper
You must deploy ZooKeeper as the first Pulsar component, as it is a dependency for the others.
$ kubectl apply -f zookeeper.yaml
Wait until all three ZooKeeper server pods are up and have the status Running. You can check on the status of the ZooKeeper pods at any time:

$ kubectl get pods -l component=zookeeper
NAME      READY     STATUS             RESTARTS   AGE
zk-0      1/1       Running            0          18m
zk-1      1/1       Running            0          17m
zk-2      0/1       Running            6          15m

This step may take several minutes, as Kubernetes needs to download the Docker image on the VMs.
Initialize cluster metadata
Once ZooKeeper is running, you need to initialize the metadata for the Pulsar cluster in ZooKeeper. This includes system metadata for BookKeeper and Pulsar more broadly. There is a Kubernetes job in the cluster-metadata.yaml file that you only need to run once:
$ kubectl apply -f cluster-metadata.yaml
For the sake of reference, that job runs the following command on an ephemeral pod:
$ bin/pulsar initialize-cluster-metadata \
  --cluster local \
  --zookeeper zookeeper \
  --configuration-store zookeeper \
  --web-service-url http://broker.default.svc.cluster.local:8080/ \
  --broker-service-url pulsar://broker.default.svc.cluster.local:6650/
Deploy the rest of the components
Once cluster metadata has been successfully initialized, you can then deploy the bookies, brokers, monitoring stack (PrometheusGrafana, and the Pulsar dashboard), and Pulsar cluster proxy:
$ kubectl apply -f bookie.yaml
$ kubectl apply -f broker.yaml
$ kubectl apply -f proxy.yaml
$ kubectl apply -f monitoring.yaml
$ kubectl apply -f admin.yaml

You can check on the status of the pods for these components either in the Kubernetes Dashboard or using kubectl:

LT-2018-9999:dev jzeng$ kubectl get svc -n pulsar
NAME                    TYPE           CLUSTER-IP       EXTERNAL-IP                                                               PORT(S)                       AGE
pulsar-bookie           ClusterIP      None             <none>                                                                    3181/TCP,8000/TCP             15h
pulsar-broker           ClusterIP      None             <none>                                                                    8080/TCP,6650/TCP             15h
pulsar-grafana          LoadBalancer   172.20.16.149    a9001d17f89b111ea8c2c0ad10985927-76992348.us-east-1.elb.amazonaws.com     3000:30268/TCP                15h
pulsar-prometheus       ClusterIP      None             <none>                                                                    9090/TCP                      15h
pulsar-proxy            LoadBalancer   172.20.67.229    a9004c2c789b111ea8c2c0ad10985927-2021214576.us-east-1.elb.amazonaws.com   80:31993/TCP,6650:31219/TCP   15h
pulsar-pulsar-manager   LoadBalancer   172.20.197.175   a900309c689b111ea8c2c0ad10985927-511715019.us-east-1.elb.amazonaws.com    9527:30601/TCP                15h
pulsar-recovery         ClusterIP      None             <none>                                                                    8000/TCP                      15h
pulsar-toolset          ClusterIP      None             <none>                                                                    <none>                        15h
pulsar-zookeeper        ClusterIP      None             <none>                                                                    2888/TCP,3888/TCP,2181/TCP    15h


LT-2018-9999:dev jzeng$ kubectl get pod -n pulsar
NAME                                     READY   STATUS                       RESTARTS   AGE
pulsar-bookie-0                          1/1     Running                      0          17h
pulsar-bookie-1                          1/1     Running                      0          17h
pulsar-bookie-2                          1/1     Running                      0          17h
pulsar-bookie-3                          1/1     Running                      0          17h
pulsar-bookie-init-r7fc4                 0/1     Completed                    0          17h
pulsar-broker-0                          1/1     Running                      1          17h
pulsar-broker-1                          1/1     Running                      0          17h
pulsar-broker-2                          1/1     Running                      1          17h
pulsar-grafana-968c86947-bfd7z           0/1     CreateContainerConfigError   0          17h
pulsar-prometheus-6589db9fd9-vlf4t       1/1     Running                      0          17h
pulsar-proxy-0                           1/1     Running                      0          17h
pulsar-proxy-1                           1/1     Running                      0          17h
pulsar-proxy-2                           1/1     Running                      0          17h
pulsar-pulsar-init-psgmc                 0/1     Completed                    0          17h
pulsar-pulsar-manager-555b657d5c-98p74   1/1     Running                      0          17h
pulsar-recovery-0                        1/1     Running                      0          17h
pulsar-toolset-0                         1/1     Running                      0          17h
pulsar-zookeeper-0                       1/1     Running                      0          17h
pulsar-zookeeper-1                       1/1     Running                      0          17h
pulsar-zookeeper-2                       1/1     Running                      0          17h

Set up properties and namespaces
Once all of the components are up and running, you'll need to create at least one Pulsar tenant and at least one namespace.
This step is not strictly required if Pulsar authentication and authorization is turned on, though it allows you to change policies for each of the namespaces later.
You can create properties and namespaces (and perform any other administrative tasks) using the pulsar-admin that is installed in pulsar-toolset-0 pod for your newly created Pulsar cluster. One easy way to perform administrative tasks is to create an alias for the pulsar-admin tool installed on the pulsar-toolset-0 pod.
$ alias pulsar-admin='kubectl exec -it pulsar-toolset-0 -n pulsar -- bin/pulsar-admin'
Now, any time you run pulsar-admin, you will be running commands from that pod.

There is a built-in cluster created:

LT-2018-0060:dev jzeng$ pulsar-admin clusters list
"pulsar"
LT-2018-0060:dev jzeng$ pulsar-admin clusters get pulsar
{
  "serviceUrl" : "http://pulsar-broker.pulsar.svc.cluster.local:8080/",
  "serviceUrlTls" : "https://pulsar-broker.pulsar.svc.cluster.local:8443/",
  "brokerServiceUrl" : "pulsar://pulsar-broker.pulsar.svc.cluster.local:6650/",
  "brokerServiceUrlTls" : "pulsar+ssl://pulsar-broker.pulsar.svc.cluster.local:6651/"
}

(serviceUrl is needed when using Pulsar Manager UI)

This command will create a tenant called ten:

$ pulsar-admin tenants create ten \
--admin-roles admin \
 --allowed-clusters local

This command will create a ns namespace under the ten tenant:
$ pulsar-admin namespaces create ten/ns
To verify that everything has gone as planned:
$ pulsar-admin tenants list
public
ten
$ pulsar-admin namespaces list ten
ten/ns
Now that you have a namespace and tenant set up, you can move on to experimenting with your Pulsar cluster from within the cluster or connecting to the cluster using a Pulsar client.

Experimenting with your cluster
Now that a tenant and namespace have been created, you can begin experimenting with your running Pulsar cluster. Using the same pulsar-admin pod via an alias, as in the section above, you can use pulsar-perf to create a test producer to publish 10,000 messages a second on a topic in the tenant and namespace you created.
First, create an alias to use the pulsar-perf tool via the admin pod:
$ alias pulsar-perf='kubectl exec pulsar-admin -it -- bin/pulsar-perf'
Now, produce messages:
$ pulsar-perf produce persistent://public/default/my-topic --rate 10000
Similarly, you can start a consumer to subscribe to and receive all the messages on that topic:
$ pulsar-perf consume persistent://public/default/my-topic --subscriber-name my-subscription-name
You can also view stats for the topic using the pulsar-admin tool:
$ pulsar-admin persistent stats persistent://public/default/my-topic

Pulsar Manager UI
You can access Pulsar manager UI through

Monitoring
The default monitoring stack for Pulsar on Kubernetes has consists of PrometheusGrafana, and the Pulsar dashbaord.
If you deployed the cluster to Minikube, the following monitoring ports are mapped at the minikube VM:
·       Prometheus port: 30003
·       Grafana port: 30004
·       Dashboard port: 30005
You can use minikube ip to find the ip address of the minikube VM, and then use their mapped portsto access corresponding services. For example, you can access Pulsar dashboard at http://$(minikube ip):30005.
Prometheus
All Pulsar metrics in Kubernetes are collected by a Prometheus instance running inside the cluster. Typically, there is no need to access Prometheus directly. Instead, you can use the Grafana interface that displays the data stored in Prometheus.
Grafana
In your Kubernetes cluster, you can use Grafana to view dashbaords for Pulsar namespaces (message rates, latency, and storage), JVM stats, ZooKeeper, and BookKeeper. You can get access to the pod serving Grafana using kubectl's port-forward command:
$ kubectl port-forward $(kubectl get pods -l component=grafana -o jsonpath='{.items[*].metadata.name}') 3000
You can then access the dashboard in your web browser at localhost:3000.
Pulsar dashboard
While Grafana and Prometheus are used to provide graphs with historical data, Pulsar dashboard reports more detailed current data for individual topics.
For example, you can have sortable tables showing all namespaces, topics, and broker stats, with details on the IP address for consumers, how long they've been connected, and much more.
You can access to the pod serving the Pulsar dashboard using kubectl's port-forward command:
1.     $ kubectl port-forward \
2.       $(kubectl get pods -l component=dashboard -o jsonpath='{.items[*].metadata.name}') 8080:80
You can then access the dashboard in your web browser at localhost:8080.
Client connections
If you deployed the cluster to Minikube, the proxy ports are mapped at the minikube VM:
·       Http port: 30001
·       Pulsar binary protocol port: 30002
You can use minikube ip to find the ip address of the minikube VM, and then use their mapped portsto access corresponding services. For example, pulsar webservice url will be at http://$(minikube ip):30001.
Once your Pulsar cluster is running on Kubernetes, you can connect to it using a Pulsar client. You can fetch the IP address for the Pulsar proxy running in your Kubernetes cluster using kubectl:
$ kubectl get service broker-proxy --output=jsonpath='{.status.loadBalancer.ingress[*].ip}'
If the IP address for the proxy were, for example, 35.12.13.198, you could connect to Pulsar using pulsar://35.12.13.198:6650.
You can find client documentation for:
·       Java
·       Python
·       C++