Tech Notes: Prometheus

Prometheus

Features

Prometheus's main features are:

a multi-dimensional data model with time series data identified by metric name and key/value pairs
a flexible query language to leverage this dimensionality
no reliance on distributed storage; single server nodes are autonomous
time series collection happens via a pull model over HTTP
pushing time series is supported via an intermediary gateway
targets are discovered via service discovery or static configuration
multiple modes of graphing and dashboarding support

Metric Types:

Counter: monotonically increasing counter

Gauge: value going up and down

Histogram: samples observations and counts them in configurable buckets

Summary: samples observations

Instrument either services or libraries

Service Instrumentation

Three types of services:

Online-serving systems: RED (requests, errors, duration)

Offline-serving systems: USE (utilization, saturation, errors)

batch jobs: see Pushgateway

Library Instrumentation

Services are what you care about at a high level. Within each of your services there are libraries that you can think of as mini services.

Exposition:

The process of making metrics available to Prometheus is known as exposition.

Pushgateway:

A metric cache for batch jobs. Remembers only the last push for each batch job. Prometheus scrapes these metrics from it.

Download it from Prometheus download page. It is an exporter that runs by default on port 9091.

Graphite bridge:

Sample python code to show Prometheus metrics:

import http.server, time

from prometheus_client import start_http_server, Counter, Gauge, Summary, Histogram

REQUESTS = Counter('request_total', 'total HTTP requests')

g = Gauge('my_inprogress_requests', 'description of my gauge')

g.set(1.1)

HISTOGRAM = Histogram('request_latency_histogram', 'histogram for the request time',

buckets=[0.0001, 0.0005, 0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1, 5, 10, 50])

LATENCY = Summary('request_latency', 'Time for a request')

class MyHandler(http.server.BaseHTTPRequestHandler):

@LATENCY.time()

@HISTOGRAM.time()

def do_GET(self):

REQUESTS.inc()

self.send_response(200)

self.end_headers()

self.wfile.write(b"Hello World")

g.inc()

if __name__ == "__main__":

start_http_server(18000)

server = http.server.HTTPServer(('localhost', 18001), MyHandler)

server.serve_forever()

http://localhost:18001 to get “Hello World”

http://localhost:18000/metrics to see metrics:

…

# HELP request_total total HTTP requests

# TYPE request_total counter

request_total 14.0

# TYPE request_created gauge

request_created 1.544661662956961e+09

# HELP my_inprogress_requests description of my gauge

# TYPE my_inprogress_requests gauge

my_inprogress_requests 15.1

# HELP request_latency_histogram histogram for the request time

# TYPE request_latency_histogram histogram

request_latency_histogram_bucket{le="0.0001"} 0.0

request_latency_histogram_bucket{le="0.0005"} 14.0

request_latency_histogram_bucket{le="0.001"} 14.0

request_latency_histogram_bucket{le="0.005"} 14.0

request_latency_histogram_bucket{le="0.01"} 14.0

request_latency_histogram_bucket{le="0.05"} 14.0

request_latency_histogram_bucket{le="0.1"} 14.0

request_latency_histogram_bucket{le="0.5"} 14.0

request_latency_histogram_bucket{le="1.0"} 14.0

request_latency_histogram_bucket{le="5.0"} 14.0

request_latency_histogram_bucket{le="10.0"} 14.0

request_latency_histogram_bucket{le="50.0"} 14.0

request_latency_histogram_bucket{le="+Inf"} 14.0

request_latency_histogram_count 14.0

request_latency_histogram_sum 0.0022318799999991867

# TYPE request_latency_histogram_created gauge

request_latency_histogram_created 1.544661662957037e+09

# HELP request_latency Time for a request

# TYPE request_latency summary

request_latency_count 14.0

request_latency_sum 0.0024406689999976194

# TYPE request_latency_created gauge

request_latency_created 1.544661662957119e+09

Prometheus metric library for Nginx written in Lua:

A Lua library that can be used with Nginx to keep track of metrics and expose them on a separate web page to be pulled by Prometheus.

Installation:

Install nginx package with lua support (libnginx-mod-http-lua on newer Debian versions, or nginx-extrason older ones). ß I did not do this for openresty

The library file, prometheus.lua, needs to be available in LUA_PATH. If this is the only Lua library you use, you can just point lua_package_path to the directory with this git repo checked out (see example below).

OpenResty users will find this library in opm. It is also available via luarocks. ß I did not do this for openresty

nginx-lua-prometheus souce code:

https://github.com/knyar/nginx-lua-prometheus/blob/master/prometheus.lua

Prometheus nginx monitoring sample config:

http://www.alexlinux.com/prometheus-nginx-monitoring-example/

Enable Prometheus counter, gauge and histogram in nginx.conf file

lua_package_path "site/lualib/?.lua;/etc/nginx/ssl/?.lua;;";

# prometheus exporter settings

lua_shared_dict prometheus_metrics 10M;

init_by_lua_block {

prometheus = require("prometheus").init("prometheus_metrics");

metric_requests = prometheus:counter("nginx_http_requests_total", "Number of HTTP requests", {"nginx_port", "method", "endpoint", "status"});

metric_latency = prometheus:histogram("nginx_http_request_duration_seconds", "HTTP request latency", {"nginx_port", "method", "endpoint", "status"});

metric_connections = prometheus:gauge("nginx_http_connections", "Number of HTTP connections", {"nginx_port", "state"});

}

log_by_lua_block {

metric_requests:inc(1, {ngx.var.server_port, ngx.var.request_method, ngx.var.uri, ngx.var.status});

metric_latency:observe(tonumber(ngx.var.request_time), {ngx.var.server_port, ngx.var.request_method, ngx.var.uri, ngx.var.status});

}

server {

server_name http_metrics;

listen 9000;

access_log /var/log/pan/directory-sync-service/nginx.access.log main;

location /metrics {

content_by_lua '

metric_connections:set(ngx.var.connections_reading, {"reading"})

metric_connections:set(ngx.var.connections_waiting, {"waiting"})

metric_connections:set(ngx.var.connections_writing, {"writing"})

prometheus:collect()

}

Above config will generate error until following changes:

metric_connections:set(ngx.var.connections_reading, {ngx.var.server_port, "reading"});

metric_connections:set(ngx.var.connections_waiting, {ngx.var.server_port, "waiting"});

metric_connections:set(ngx.var.connections_writing, {ngx.var.server_port, "writing"});

Prometheus docker container:

SJCMACJ15JHTD8:docker jzeng$ docker pull prom/Prometheus

SJCMACJ15JHTD8:docker jzeng$ docker run --rm --add-host sv3-dsappweb1-devr1.ds.pan.local:10.105.50.23 -p 9090:9090 -d --name prometheus bc2b9d813555

ß ‘add-host’ may not be needed if ip address is used in Prometheus.yml

SJCMACJ15JHTD8:prometheus jzeng$ docker exec -it 3cd8e7a2ddc6 /bin/sh

Add more job to Prometheus.yml:

/etc/prometheus $ vi prometheus.yml

- job_name: 'ds_metrics'

static_configs:

- targets: ['10.105.50.23:9000']

metrics_path: "/metrics"

10.105.50.23 is the ip of ‘sv3-dsappweb1-devr1.ds.pan.local’

Reload the changes:

/bin $ kill -HUP {pid_of_prometheus}

Check Prometheus logs:

SJCMACJ15JHTD8:~ jzeng$ docker logs dbee6bb15ed2

Access to Promdash UI:

http://localhost:9090

Check ‘Status/Targets’ to make sure ‘ds_metrics’ is UP.

Access to Prometheus metrics:

http://localhost:9090/metrics

Installation of Grafana:

http://docs.grafana.org/installation/mac/

then http://localhost:3000

admin/admin

HeatMap:

Heatmap format is suitable for displaying metrics having histogram type on Heatmap panel. Under the hood, it converts cumulative histogram to regular and sorts series by the bucket bound

The query for displaying histogram to HeatMap:

sum(rate(nginx_http_request_duration_seconds_bucket{instance=~"$INSTANCE"}[10m])) by (le)

Format: Time series

Legend format: {(le)}

Data format: Time series

Sample query:

Get fatal count for certain rest API and group them by endpoint:

sum(nginx_http_requests_total{instance=~"$INSTANCE",endpoint!="service/directory/v1/health",endpoint!="c",endpoint=~"/suscription|/agent/status|/service/directory/v1/.*|/directory-sync-service/v1/.*",status=~"5.."}) by (endpoint)

Google Cloud Stackdriver:

Stackdriver Kubernetes Monitoring integrates metrics, logs, events, and metadata from your Kubernetes environment and from your Prometheus instrumentation, to help you understand, in real time, your application’s behavior in production, no matter your role and where your Kubernetes deployments run.

Tech Notes

Saturday, December 15, 2018

Prometheus

No comments:

Post a Comment