Tuesday, October 1, 2019

Flow metrics with Prometheus and Grafana

The Grafana dashboard above shows real-time network traffic flow metrics. This article describes how to define and collect flow metrics using the Prometheus time series database and build Grafana dashboards using those metrics.
Prometheus exporter describes an application that runs on the sFlow-RT analytics platform that converts real-time streaming telemetry from industry standard sFlow agents. Host, Docker, Swarm and Kubernetes monitoring describes how to deploy agents on popular container orchestration platforms.

The latest version of the Prometheus exporter application adds flow export.
global:
  scrape_interval:     15s
  evaluation_interval: 15s

rule_files:
  # - "first.rules"
  # - "second.rules"

scrape_configs:
  - job_name: 'sflow-rt-metrics'
    metrics_path: /prometheus/metrics/ALL/ALL/txt
    static_configs:
      - targets: ['10.0.0.70:8008']
  - job_name: 'sflow-rt-src-dst-bps'
    metrics_path: /app/prometheus/scripts/export.js/flows/ALL/txt
    static_configs:
      - targets: ['10.0.0.70:8008']
    params:
      metric: ['ip_src_dst_bps']
      key: ['ipsource','ipdestination']
      label: ['src','dst']
      value: ['bytes']
      scale: ['8']
      minValue: ['1000']
      maxFlows: ['100']
  - job_name: 'sflow-rt-countries-bps'
    metrics_path: /app/prometheus/scripts/export.js/flows/ALL/txt
    static_configs:
      - targets: ['10.0.0.70:8008']
    params:
      metric: ['ip_countries_bps']
      key: ['null:[country:ipsource]:unknown','null:[country:ipdestination]:unknown']
      label: ['src','dst']
      value: ['bytes']
      scale: ['8']
      aggMode: ['sum']
      minValue: ['1000']
      maxFlows: ['100']
The above prometheus.yml file extends the previous example to add two additional scrape jobs, sflow-rt-src-dst-bps and sflow-rt-countries-bps, that return flow metrics. Defining flows describes the attributes and settings available to build a flow definition. The metric: setting names the Prometheus metric and the label: setting is used to map corresponding sFlow-RT flow keys into Prometheus labels.

Updated 19 October 2019, native support for Prometheus export added to sFlow-RT, sflow-rt-metrics job modified reflect new API.
The first step in building a Grafana dashboard panel to display flow data is to construct a query:
topk(10, sum(ip_src_dst_bps) by (src))
In this case, the query sums the flows by source address and return the top 10 values for each interval in the graph.

The query for the Top Source Countries chart is a little more complex:
topk(10,sum(ip_countries_bps{src!="unknown"}) by (src))
In this case unknown source country values (the value set in the prometheus.yml file to use when a country lookup fails on an ipsource address) are excluded in the query.
In the visualization settings, Null value: null as zeroTooltip Mode: Single, label the Left Y axis, and Legend Show disabled.
Finally, give the chart a title.
The Prometheus exporter application on sFlow-RT (accessible on port 8008) has a REST API explorer, above, that can be used to experiment with flow settings before configuring a Prometheus scraper job. When testing the settings, the first query will not return any data since the flow hasn't been programmed. Click the Execute button a second time to see data. Also consider using the sflow/flow-trend application as a way to gain familiarity with sFlow-RT's flow analytics engine.

Update 2/19/2020: The following pre-built Grafana dashboards are available: sFlow-RT Countries and NetworkssFlow-RT Network Interfaces, and sFlow-RT Health.

21 comments:

  1. Hi Peter,

    1.I have installed prometheus on sflow-rt in server A
    2. i have installed Grafana in server A
    3. i have opened the API section in the sflow-rt. i have tried the get/post in the metrics and prometheus sections.
    i have received some data when i execute it.

    i tried to import the data through data source option in the Grafana. but, none appeared. i am receiving an error, bad gateway .
    i do not know why>?
    its silly to ask , if you dont mind can you pinpoint my ignorance.

    ReplyDelete
    Replies
    1. Can you query the sFlow-RT data in Prometheus? The example requires the geo.country System Property to be set, i.e.
      geo.country=resources/config/GeoLite2-Country.mmdb

      The dashboard described in this article is available as a pre-built Grafana dashboard, sFlow-RT Countries and Networks, along with sFlow-RT Network Interfaces

      Delete
  2. This comment has been removed by the author.

    ReplyDelete
  3. Hi Dr.Peter,
    Yes, i could able to get the Geo MAP. Its running smoothly with this command.
    ./start.sh -Dgeo.country=resources/config/GeoLite2-Country.mmdb -Dgeo.asn=resources/config/GeoLite2-ASN.mmdb

    I Verified that the metrics are available using cURL: when i run this command, it displays some metrics data.

    $ curl http://A.B.C.D:8008/prometheus/metrics/ALL/ALL/txt
    sflow_ifinucastpkts{agent="10.0.0.30",datasource="2",host="server",ifname="enp3s0"} 9.44
    sflow_ifoutdiscards{agent="10.0.0.30",datasource="2",host="server",ifname="enp3s0"} 0
    ........................... etc..,
    Previously, i didn’t know about this pre-built dashboards existed.
    https://grafana.com/grafana/dashboards/11146
    https://grafana.com/grafana/dashboards/11201
    So, i have installed grafana .deb filedirectly and unpacked it. i can able to access the http://0.0.0.0:3000 .its working very well.
    I tried by adding the prometheus data source to the grafana thorugh this URL: htt://server-ip:9090
    Its show Http bad gateway.
    --------------------------------------------------------------------------------------
    Later, i followed your instructions and ran the docker commands. I have received the given below outputs on my screen.
    docker run -d -p 6343:6343/udp -p 8008:8008 sflow/prometheus
    docker pull sflow/prometheus
    Using default tag: latest
    latest: Pulling from sflow/prometheus
    Digest: sha256:9cf91fedxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    Status: Image is up to date for sflow/prometheus:latest
    docker.io/sflow/prometheus:latest


    I guess, i couldn’t run the pre-built Grafana dashboard commands through docker.
    I have stopped the ./start.sh daemon. Right after that i ran docker run -d -p 6343:6343/udp -p 8008:8008 sflow/prometheus

    i could not able to execute the docker command.its not running. in my system i have installed docker. its running.

    I really appreciate your time on this matter!!!

    ReplyDelete
    Replies
    1. What error is being reported when you try and run the Docker image? You should be able to run the pre-built Grafana dashboards using the grafana/grafana Docker image, see Prometheus exporter

      Delete
    2. Addition to the previous thread:
      java -version
      openjdk version "11.0.5" 2019-10-15
      OpenJDK Runtime Environment (build 11.0.5+10-post-Raspbian-1deb10u1)
      OpenJDK Server VM (build 11.0.5+10-post-Raspbian-1deb10u1, mixed mode)

      Delete
  4. Hi Dr.Peter,

    1.docker -v

    Docker version 19.03.5, build 633a0ea

    ---------------------------------------------------------------------------

    2. I ran this command while my sflow-rt web portal (http://192.168.20.2:8008 /html/index.html#)is active.

    -> docker run -d -p 6343:6343/udp -p 8008:8008 sflow/prometheus

    i received the following error.

    Output:

    69d1b146feffa62a0150183fb7df9d6f6f487711e6e36660c650549eadb478ec

    docker: Error response from daemon: driver failed programming external connectivity on endpoint gracious_bose (08b71d58817e6153bb1c278ae4c3bbdefdfdda38e36010c34037805807e1df7b): Error starting userland proxy: listen tcp 0.0.0.0:8008: bind: address already in use.

    --------------------------------------------------------------------------

    3. I ran this command while my sflow-rt web portal (http://192.168.20.2:8008 /html/index.html#)is inactive.

    docker run -d -p 6343:6343/udp -p 8008:8008 sflow/prometheus

    i received the following status

    Output:

    0dfecde22c3e4de3054f95337d9fbe1962a315d1fc8f7e2e74c379a937daf515

    ---------------------------------------------------------------------------

    4. I followed the instructions from :https://blog.sflow.com/2019/04/prometheus-exporter.html
    I Verified that the metrics are available using cURL: when i run this command, it displays some metrics data.

    $ curl http://192.168.20.2:8008/prometheus/metrics/ALL/ALL/txt
    sflow_ifinucastpkts{agent="192.168.20.2",datasource="2",host="server",ifname="enp3s0"} 9.44
    sflow_ifoutdiscards{agent="192.168.20.2",datasource="2",host="server",ifname="enp3s0"} 0

    --------------------------------------------------------------------------

    5.At present, this is how i run my sFlow-rt with this command:

    ./start.sh -Dgeo.country=resources/config/GeoLite2-Country.mmdb

    Its running smoothly.
    --------------------------------------------------------------------------

    What am i missing ?

    ReplyDelete
    Replies
    1. The docker sflow/prometheus image contains Java, sFlow-RT and the prometheus app. You can't run a second copy of sFlow-RT (using ./start.sh) since this will clash on tcp port 8008 and udp port 6343.

      Delete
  5. Dr.Peter,

    If i cold possibly, change the TCP/UDP port using mapping scripts in the server. will work ? just an assumption

    do you have any other suggestions/solutions to overcome the prometheus issues?

    ReplyDelete
    Replies
    1. You should only be running one copy of sFlow-RT, you can run it natively using ./start.sh, or under docker using sflow/sflow-rt. In either of these cases you need to install the sFlow-RT prometheus app.

      The sflow/promethues docker image was built from the sflow/sflow-rt image to include the prometheus app and related settings (like enabling geolocation). You can still include your own scripts and set features by following instructions for the sflow/sflow-rt image.

      Delete
  6. Dr.Peter,
    i understand that, only one instance can be run. more than one instance of sflow-rt cant be override on the same server.

    I have installed prometheus app in the sflow-rt.
    when i access the promethues app window.i can able to see the below following options.

    1.Analyzer
    2.Flows
    3.Metrics
    4.Metrics

    For instance: Get sFlow-RT performance metrics
    i tried "GET" and executed the scripts. eventually, i received some metrics instantly.
    but, why i cant get the graph in the prometheus.

    I really appreciate and thank for your time on this matter.

    ReplyDelete
    Replies
    1. It's helpful to look at the Prometheus scrape status page to see if the scrape tasks are succeeding.

      Delete
  7. Prometheus web link from sFlow-RT
    http://10.0.0.0:8008/app/prometheus/api/index.html
    Request URL
    curl -X GET "http://10.0.0.0:8008/app/prometheus/scripts/../../../prometheus/metrics/ALL/ALL/txt" -H "accept: text/plain"
    Server response
    Code: 200
    Response body
    sflow_eth_internalmacrxerrors{agent="10.10.10.10",datasource="44",ifindex="44",ifspeed="10G",iftype="ethernetCsmacd",ifadminstatus="up",ifoperstatus="up"} 0.0
    sflow_eth_symbolerrors{agent="20.20.20.20",datasource="44",ifindex="44",ifspeed="10G",iftype="ethernetCsmacd",ifadminstatus="up",ifoperstatus="up"} 0.0
    sflow_ifinunknownprotos{agent="40.40.40.40",datasource="44",ifindex="44",ifspeed="10G",iftype="ethernetCsmacd",ifadminstatus="up",ifoperstatus="up"} 0.0
    sflow_eth_latecollisions{agent="40.40.40.40",datasource="44",ifindex="44",ifspeed="10G",iftype="ethernetCsmacd",ifadminstatus="up",ifoperstatus="up"} 0.0


    This is the overall scrape status received from the Prometheus scrape page by executing fore-mentioned commands.

    Between, i have installed the docker grafana with the default port 3000.
    when i try to get the WEB UI, i could not able to connect to it.

    i have verified, 3000 TCP port is functioning on my server.
    i have configured grafana as per this instruction page:
    https://grafana.com/docs/grafana/latest/installation/configuration/


    what am i missing here Dr.Peter, seems like there is a glitch on the executions. i failed to figure that out. but, i dont want to give up on this thread.

    i appreciate your valuable time on this matter.

    Wish you a happy New-year!

    ReplyDelete
  8. Dr.Peter,
    I have restarted the server. now, the grafana docker is up and WEB UI is accessible.

    but, the above mentioned problem is existed. couldnt add the data source in
    prometheus.

    ReplyDelete
  9. Dear Peter,

    We have configured Prometheus according this guide https://grafana.com/grafana/dashboards/11146 , but measurements are wrong in Grafana. We are getting traffic in Tbps , but should be in Mbps or Gbps.We use Huawei sflow for exporting to sFlow-RT. How many sflow agents is supported ? We use just three.Totaly network troughput is about 20Gbps

    ReplyDelete
    Replies
    1. It sounds like there might be an issue with the sFlow from the Huawei routers. Are you running the latest firmware? Can you share the sFlow configuration? Large flow detection recommends settings. Try running the sflow-test tool - it compares interface counters and packet samples to verify the accuracy of an sFlow agent.

      Delete
  10. Hi, I am stuck with a maybe very trivial problem.

    I am using the sflow/prometheus docker image. I am collecting flows from HPe comware switches.

    Unfortunatly, I am unable to retreive any metrics data.

    If I do a "curl http://A.B.C.D:8008/prometheus/metrics/ALL/ALL/txt" no metrics is displayed, curl return nothing.

    I am missing something ?

    Best regards

    ReplyDelete
    Replies
    1. Do you see any agents on the top left sFlow Agents gauge on the sFlow-RT status page? Check the sFlow Agents, sFlow Bytes, and sFlow Packets gauges to confirm that sFlow is being received. You need to configure the switches to send sFlow to sFlow-RT on port 6343.

      Check the URL /agents/json to see if any errors are being reported.

      Delete
    2. The sflow agents of our switches ares working. sflow report correctly the 2 agents, and the gauges are displaying Bytes and Packets correctly.

      The /agents/json report some "sFlowDatagramsLost", but it's very minor compare to the number of flow received.

      Thanks

      Delete
    3. Can you check that you are receiving counter samples? You can check the sFlowCounterSamples values in the /agents/json query. If you aren't receiving counter samples, you need to configure sFlow polling on the switches.

      Delete
    4. This is it, I didn't even know that I have this functionality on my switches. I was only sending flows and not the counter samples.

      On HPe comware, I add these line in my interface configuration:

      sflow counter collector 1
      sflow counter interval 15

      All is working well now.

      Thanks for pointing the obvious !

      Delete