Thursday, April 15, 2021

PromQL and Grafana

PromQL syntax is kind of non-intuitive and hard to remember.  If you use it everyday, you should be happy.  But if you use it occasionally, you might have to learn it each time when you are using.  In my case, I totally forgot its syntax even I used it two years ago.  

Now, I decided to document some of them so I will not have to learn it again next time.

1. Fiter in metrics

Without filter, you will get too many things you do not need.  For example, you will get graphs for all containers without filter in PromQL. That will be hard for you to find the container or pod you are trying to monitor. By applying filter in "metrics", you can display metrics for the ones you are interested. 

I will go through an example to explain some key syntax for filters:

{namespace="your_name_space",pod=~"main-api.+",container="main-api-container"}

  • If you want to exact match, put your name insider the double quotes.  This is the easiest case.
  • If you want to use wild card match, "*" will not work.  Instead, you need to use regex and put ~ (tilde) before the regex.  This syntax is weird, but that is how PromQL works.
  • If you have multiple filter conditions, 'and' and '&&" will not work for it.  Instead, use comma (,) to concatenate them so they will work as 'and' logical operator.

2. Legend name

By default, the Legend name is super long so it will be hard for you to find the difference between them. 

You will basically use the similar way as what you did for filter to filter out the fields you do not need.  Instead of key-value pairs in filter field, you put the field name between "{{" and "}}".

For example, if you want to focus on pod name, just put {{pod}} into the 'Legend' field.  That will display only the pod name in Legend.  You can use comma (,) to concatenate multiple fields in Legend.  Here the comma has different meaning from what you see in filters. 

3. Sample PromQL 

For CPU usage per container:

rate(container_cpu_usage_seconds_total{namespace="my_name_space",pod=~"main-.+",container="main-api-container"}[100s]) * 100

For network traffic per pod:

rate(container_network_receive_bytes_total{namespace="my_name_space",pod=~"main.+"}[100s]) * 100

For memory usage per container:

container_memory_working_set_bytes{namespace="my_name_space",pod=~"detector-.+",container="detector-container"}

For network usage:

rate(container_network_receive_bytes_total{namespace="mynamespace",pod=~"main-.+"}[100s]) * 100