Saturday, December 15, 2018

Prometheus


Prometheus

Features
Prometheus's main features are:
  • a multi-dimensional data model with time series data identified by metric name and key/value pairs
  • flexible query language to leverage this dimensionality
  • no reliance on distributed storage; single server nodes are autonomous
  • time series collection happens via a pull model over HTTP
  • pushing time series is supported via an intermediary gateway
  • targets are discovered via service discovery or static configuration
  • multiple modes of graphing and dashboarding support

Metric Types:

Counter: monotonically increasing counter
Gauge: value going up and down
Histogram: samples observations and counts them in configurable buckets
Summary: samples observations

Instrument either services or libraries

Service Instrumentation

Three types of services:

Online-serving systems: RED (requests, errors, duration)
Offline-serving systems: USE (utilization, saturation, errors)
batch jobs: see Pushgateway

Library Instrumentation

Services are what you care about at a high level. Within each of your services there are libraries that you can think of as mini services.

Exposition:
The process of making metrics available to Prometheus is known as exposition.
Pushgateway:
A metric cache for batch jobs.  Remembers only the last push for each batch job. Prometheus scrapes these metrics from it.

Download it from Prometheus download page.  It is an exporter that runs by default on port 9091.

Graphite bridge:

Sample python code to show Prometheus metrics:

import http.server, time
from prometheus_client import start_http_server, Counter, Gauge, Summary, Histogram

REQUESTS = Counter('request_total', 'total HTTP requests')
g = Gauge('my_inprogress_requests', 'description of my gauge')
g.set(1.1)

HISTOGRAM = Histogram('request_latency_histogram', 'histogram for the request time',
                      buckets=[0.0001, 0.0005, 0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1, 5, 10, 50])

LATENCY = Summary('request_latency', 'Time for a request')

class MyHandler(http.server.BaseHTTPRequestHandler):
    @LATENCY.time()
    @HISTOGRAM.time()
    def do_GET(self):
        REQUESTS.inc()
        self.send_response(200)
        self.end_headers()
        self.wfile.write(b"Hello World")
        g.inc()

if __name__ == "__main__":
    start_http_server(18000)
    server = http.server.HTTPServer(('localhost', 18001), MyHandler)
server.serve_forever()

http://localhost:18001 to get “Hello World”
# HELP request_total total HTTP requests
# TYPE request_total counter
request_total 14.0
# TYPE request_created gauge
request_created 1.544661662956961e+09
# HELP my_inprogress_requests description of my gauge
# TYPE my_inprogress_requests gauge
my_inprogress_requests 15.1
# HELP request_latency_histogram histogram for the request time
# TYPE request_latency_histogram histogram
request_latency_histogram_bucket{le="0.0001"} 0.0
request_latency_histogram_bucket{le="0.0005"} 14.0
request_latency_histogram_bucket{le="0.001"} 14.0
request_latency_histogram_bucket{le="0.005"} 14.0
request_latency_histogram_bucket{le="0.01"} 14.0
request_latency_histogram_bucket{le="0.05"} 14.0
request_latency_histogram_bucket{le="0.1"} 14.0
request_latency_histogram_bucket{le="0.5"} 14.0
request_latency_histogram_bucket{le="1.0"} 14.0
request_latency_histogram_bucket{le="5.0"} 14.0
request_latency_histogram_bucket{le="10.0"} 14.0
request_latency_histogram_bucket{le="50.0"} 14.0
request_latency_histogram_bucket{le="+Inf"} 14.0
request_latency_histogram_count 14.0
request_latency_histogram_sum 0.0022318799999991867
# TYPE request_latency_histogram_created gauge
request_latency_histogram_created 1.544661662957037e+09
# HELP request_latency Time for a request
# TYPE request_latency summary
request_latency_count 14.0
request_latency_sum 0.0024406689999976194
# TYPE request_latency_created gauge
request_latency_created 1.544661662957119e+09



Prometheus metric library for Nginx written in Lua:

A Lua library that can be used with Nginx to keep track of metrics and expose them on a separate web page to be pulled by Prometheus.

Installation:

Install nginx package with lua support (libnginx-mod-http-lua on newer Debian versions, or nginx-extrason older ones).  ß I did not do this for openresty

The library file, prometheus.lua, needs to be available in LUA_PATH. If this is the only Lua library you use, you can just point lua_package_path to the directory with this git repo checked out (see example below).
OpenResty users will find this library in opm. It is also available via luarocks. ß I did not do this for openresty


nginx-lua-prometheus souce code:




Prometheus nginx monitoring sample config:



Enable Prometheus counter, gauge and histogram in nginx.conf file

  lua_package_path "site/lualib/?.lua;/etc/nginx/ssl/?.lua;;";
  # prometheus exporter settings
  lua_shared_dict prometheus_metrics 10M;
  init_by_lua_block {
        prometheus = require("prometheus").init("prometheus_metrics");
        metric_requests = prometheus:counter("nginx_http_requests_total", "Number of HTTP requests", {"nginx_port", "method", "endpoint", "status"});
        metric_latency = prometheus:histogram("nginx_http_request_duration_seconds", "HTTP request latency", {"nginx_port", "method", "endpoint", "status"});
        metric_connections = prometheus:gauge("nginx_http_connections", "Number of HTTP connections", {"nginx_port", "state"});
  }
  log_by_lua_block {
        metric_requests:inc(1, {ngx.var.server_port, ngx.var.request_method, ngx.var.uri, ngx.var.status});
        metric_latency:observe(tonumber(ngx.var.request_time), {ngx.var.server_port, ngx.var.request_method, ngx.var.uri, ngx.var.status});
  }
  server {
    server_name http_metrics;
    listen 9000;
    access_log   /var/log/pan/directory-sync-service/nginx.access.log  main;

   location /metrics {
      content_by_lua '
        metric_connections:set(ngx.var.connections_reading, {"reading"})
        metric_connections:set(ngx.var.connections_waiting, {"waiting"})
        metric_connections:set(ngx.var.connections_writing, {"writing"})
        prometheus:collect()
      ';
    }
  }

Above config will generate error until following changes:

                metric_connections:set(ngx.var.connections_reading, {ngx.var.server_port, "reading"});
                metric_connections:set(ngx.var.connections_waiting, {ngx.var.server_port, "waiting"});
                metric_connections:set(ngx.var.connections_writing, {ngx.var.server_port, "writing"});


Prometheus docker container:

SJCMACJ15JHTD8:docker jzeng$ docker pull prom/Prometheus
SJCMACJ15JHTD8:docker jzeng$ docker run --rm --add-host sv3-dsappweb1-devr1.ds.pan.local:10.105.50.23 -p 9090:9090 -d --name prometheus bc2b9d813555
ß ‘add-host’ may not be needed if ip address is used in Prometheus.yml
SJCMACJ15JHTD8:prometheus jzeng$ docker exec -it 3cd8e7a2ddc6 /bin/sh

Add more job to Prometheus.yml:

/etc/prometheus $ vi prometheus.yml

  - job_name: 'ds_metrics'

    static_configs:
    - targets: ['10.105.50.23:9000']       
metrics_path: "/metrics"   

10.105.50.23 is the ip of ‘sv3-dsappweb1-devr1.ds.pan.local’

Reload the changes:

/bin $ kill -HUP {pid_of_prometheus}

Check Prometheus logs:

SJCMACJ15JHTD8:~ jzeng$ docker logs dbee6bb15ed2

Access to Promdash UI:

Check ‘Status/Targets’ to make sure ‘ds_metrics’ is UP.

Access to Prometheus metrics:



Installation of Grafana:


admin/admin


HeatMap:

Heatmap format is suitable for displaying metrics having histogram type on Heatmap panel. Under the hood, it converts cumulative histogram to regular and sorts series by the bucket bound

The query for displaying histogram to HeatMap:

sum(rate(nginx_http_request_duration_seconds_bucket{instance=~"$INSTANCE"}[10m])) by (le)

Format: Time series
Legend format: {(le)}
Data format: Time series

Sample query:

Get fatal count for certain rest API and group them by endpoint:

sum(nginx_http_requests_total{instance=~"$INSTANCE",endpoint!="service/directory/v1/health",endpoint!="c",endpoint=~"/suscription|/agent/status|/service/directory/v1/.*|/directory-sync-service/v1/.*",status=~"5.."}) by (endpoint)

Google Cloud Stackdriver:

Stackdriver Kubernetes Monitoring integrates metrics, logs, events, and metadata from your Kubernetes environment and from your Prometheus instrumentation, to help you understand, in real time, your application’s behavior in production, no matter your role and where your Kubernetes deployments run.


No comments:

Post a Comment