Grafana: Top N values

Aus Wiki-WebPerfect
Wechseln zu: Navigation, Suche

Top N values

Telegraf configuration

Because there is no Perfmon-Counter for the total size of Cluster Shared Volumes (CSV), I wrote a PowerShell CMDLET.

 [[inputs.exec]]
   commands = ['''powershell.exe -NoProfile -Command "Get-Volume | Where-Object {$_.FileSystem -eq 'CSVFS'} | select FileSystemLabel, AllocationUnitSize, Size, SizeRemaining, @{N='SizeUsed';E={$_.Size - $_.SizeRemaining}} | ConvertTo-Json"''']
   name_override = "cluster_csv"
   data_format = "json"
   data_type = "float"
   tag_keys = ["FileSystemLabel"] 

Measurement = cluster_csv
Tags = FileSystemLabel


InfluxQL in Grafana

Size = Total size of the CSV
SizeUsed = Used size of the CSV
FileSystemLabel = Is the name of the CSV

InfluxQL:

SELECT top("UsedSpace (%)","FileSystemLabel",5) AS "UsedSpace (%)" FROM (
  SELECT (100 / mean("Size")) * mean("SizeUsed") AS "UsedSpace (%)"
  FROM "$rp"."cluster_csv" 
  WHERE $timeFilter 
  GROUP BY "FileSystemLabel"
) 

Description of the Query:
$rp: is a Grafana template variable to select InfluxDB retention policy. If you use this in as a template variable you can change the retention policy for the whole dashboard (you don't have to change each panel). First Query: Grafana selects and calculates the percentage of used space per CSV in the timerange of the dashboard.
Second Query: Grafana selects the top 5 CSV's based on the value of "UsedSpace (%)" per "FileSystemLabel" (CSV-Name).
Conclusion: With this query you have a table of the 5 most used space per CSV's.

Grafana Panel settings:
FORMAT AS = Table
Visualization = Table
Transform = Organize fields = Hiding "Time" and renaming FileSystemLabel to CSV Overrides = Fields with name = "UsedSpace (%)" -> Unit = "Percent (0-100)", Decimals = 1, Cell Display mode = "Gradient gauge"



Top N values for multiple instances

You only have to use the following InfluxQL if you have multiple "instance" and you have to sum the instance for each node before select the top. In this example each host has two Network interfaces and we want to know the utilization of the whole host and not per each NIC.

Telegraf configuration

Bytes Received/sec = Received Bytes/sec for each NIC
Bytes Sent/sec = Sent Bytes/sec for each NIC
Current Bandwidth = Current Bandwidth for each NIC

  [[inputs.win_perf_counters.object]]
    ObjectName = "Network Interface"
    Instances = ["*"]
    Counters = [
      "Bytes Received/sec",
      "Bytes Sent/sec",
      "Current Bandwidth",
    ]
    Measurement = "win_net" 


InfluxQL in Grafana

  SELECT top("Average","host",5) AS "Average" FROM (
    SELECT sum("Average") AS "Average" FROM (
      SELECT ((100 / (mean("Current_Bandwidth") / 8)) * (mean("Bytes_Sent_persec") + mean("Bytes_Received_persec"))) AS "Average"
      FROM "$rp"."win_net" 
      WHERE $timeFilter AND "instance" !~ /^Hyper-V.+/
      GROUP BY "host", "instance"
    ) GROUP BY "host"
  ) 

Description of the Query:
First Query: Convert the "Current_Bandwidth" from Bits/sec to Bytes/sec and calculate the utilization with the used Bandwidth (received & sent) in percentage. Filter instance that are not equal "Hyper-V..." and group this by "host" and "instance".
Second Query: Sum the utilization of the two NIC's per each host.
Third Query: Selects the top 5 hosts based on the utilization of his NICs.

Grafana Panel settings: It like Grafana settings from above.


Flux in Grafana (top 5)

//define variables
bucket = "telegraf"
instancefilter = /^Hyper-V.+/
top = 5


//calculate the average of "Current_Bandwidth" for each host and NIC
mean_Current_Bandwidth = from(bucket: bucket)
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => 
    r._measurement == "win_net" and 
    r.instance !~ instancefilter and 
    r._field == "Current_Bandwidth"
  )
  |> group(columns: ["host", "instance"])
  |> mean()

//calculate the average of "Bytes_Received_persec" for each host and NIC
mean_Bytes_Received_persec = from(bucket: bucket)
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => 
    r._measurement == "win_net" and 
    r.instance !~ instancefilter and 
    r._field == "Bytes_Received_persec"
  )
  |> group(columns: ["host", "instance"])
  |> mean()

//calculate the average of "Bytes_Sent_persec" for each host and NIC
mean_Bytes_Sent_persec = from(bucket: bucket)
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => 
    r._measurement == "win_net" and 
    r.instance !~ instancefilter and 
    r._field == "Bytes_Sent_persec"
  )
  |> group(columns: ["host", "instance"])
  |> mean()

//joins both datastreams, calculate the sum of sent and received traffic
join_total = join(
    tables: {Bytes_Received_persec:mean_Bytes_Received_persec, Bytes_Sent_persec:mean_Bytes_Sent_persec},
    //because of the mean calculation in the streams above, there is no "_time" instead you have to use "_start" & "_stop" for the join
    on: ["_start", "_stop", "host", "instance"]
  )
  |> map(fn: (r) => ({ r with
    _value:
      if not exists r._value_Bytes_Received_persec or r._value_Bytes_Received_persec == 0.0 then 0.0
      else if not exists r._value_Bytes_Sent_persec or r._value_Bytes_Sent_persec == 0.0 then 0.0
      else float(v: r._value_Bytes_Received_persec + r._value_Bytes_Sent_persec)
  }))


//joins datastream and the other join -> Workaround: Currently only Joins of max two datastreams is possible
join(
    tables: {Current_Bandwidth:mean_Current_Bandwidth, Bytes_Total:join_total},
    //because of the mean calculation in the streams above, there is no "_time" instead you have to use "_start" & "_stop" for the join
    on: ["_start", "_stop", "host", "instance"]
  )
  |> map(fn: (r) => ({ r with
    //create a new field
    calculated_Network_Load_percent: 
      if not exists r._value_Current_Bandwidth or r._value_Current_Bandwidth == 0.0 then 0.0 //check if the value does not exists
      else if not exists r._value_Bytes_Total or r._value_Bytes_Total == 0.0 then 0.0 //check if the value does not exists
      else float(v: (100.0 / (r._value_Current_Bandwidth / 8.0)) * r._value_Bytes_Total) //calculate the network load in percent
  }))
  //select top n hosts with the highest average of both NICs
  |> highestAverage(n:top, column: "calculated_Network_Load_percent", groupColumns: ["host"])