Traffic Health
There are times when Kiali’s default thresholds for traffic health do not work well for a particular situation. For example, at times 404 response codes are expected. Kiali has the ability to set powerful, fine-grained overrides for health configuration.
Default Configuration
By default Kiali uses the traffic rate configuration shown below. Application errors have minimal tolerance while client errors have a higher tolerance reflecting that some level of client errors is often normal (e.g. 404 Not Found):
- For http protocol 4xx are client errors and 5xx codes are application errors.
- For grpc protocol all 1-16 are errors (0 is success).
So, for example, if the rate of application errors is >= 0.1% Kiali will show Degraded
health and if > 10% will show Failure
health.
# ...
health_config:
rate:
- namespace: ".*"
kind: ".*"
name: ".*"
tolerance:
- code: "^5\\d\\d$"
direction: ".*"
protocol: "http"
degraded: 0
failure: 10
- code: "^4\\d\\d$"
direction: ".*"
protocol: "http"
degraded: 10
failure: 20
- code: "^[1-9]$|^1[0-6]$"
direction: ".*"
protocol: "grpc"
degraded: 0
failure: 10
# ...
Custom Configuration
Custom health configuration is specified in the Kiali CR. To see the supported configuration syntax for health_config
see the Kiali CR Reference.
Kiali applies the first matching rate configuration (namespace, kind, etc) and calculates the status for each tolerance. The reported health will be the status with highest priority (see below).
Rate Option | Definition | Default | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
namespace | Matching Namespaces (regex) | .* (match all) | |||||||||||||||
kind | Matching Resource Types (workload|app|service) (regex) | .* (match all) | |||||||||||||||
name | Matching Resource Names (regex) | .* (match all) | |||||||||||||||
tolerance | Array of tolerances to apply. | ||||||||||||||||
Tolerance Option | Definition | Default |
---|---|---|
code | Matching Response Status Codes (regex) [1] | required |
direction | Matching Request Directions (inbound|outbound) (regex) | .* (match all) |
protocol | Matching Request Protocols (http|grpc) (regex) | .* (match all) |
degraded | Degraded Threshold(% matching requests >= value) | 0 |
failure | Failure Threshold (% matching requests >= value) | 0 |
[1] The status code typically depends on the request protocol. The special code -, a single dash, is used for requests that don’t receive a response, and therefore no response code.
Kiali reports traffic health with the following top-down status priority :
Priority | Rule (value=% matching requests) | Status |
---|---|---|
1 | value >= FAILURE threshold | FAILURE |
2 | value >= DEGRADED threshold AND value < FAILURE threshold | DEGRADED |
3 | value > 0 AND value < DEGRADED threshold | HEALTHY |
4 | value = 0 | HEALTHY |
5 | No traffic | No Health Information |
Examples
These examples use the repo https://github.com/kiali/demos/tree/master/error-rates.
In this repo we can see 2 namespaces: alpha and beta (Demo design).
Alpha |
Where nodes return the responses (You can configure responses here):
App (alpha/beta) | Code | Rate |
---|---|---|
x-server | 200 | 9 |
x-server | 404 | 1 |
y-server | 200 | 9 |
y-server | 500 | 1 |
z-server | 200 | 8 |
z-server | 201 | 1 |
z-server | 201 | 1 |
The applied traffic rate configuration is:
# ...
health_config:
rate:
- namespace: "alpha"
tolerance:
- code: "404"
failure: 10
protocol: "http"
- code: "[45]\\d[^\\D4]"
protocol: "http"
- namespace: "beta"
tolerance:
- code: "[4]\\d\\d"
degraded: 30
failure: 40
protocol: "http"
- code: "[5]\\d\\d"
protocol: "http"
# ...
After Kiali adds default configuration we have the following (Debug Info Kiali):
{
"healthConfig": {
"rate": [
{
"namespace": "/alpha/",
"kind": "/.*/",
"name": "/.*/",
"tolerance": [
{
"code": "/404/",
"degraded": 0,
"failure": 10,
"protocol": "/http/",
"direction": "/.*/"
},
{
"code": "/[45]\\d[^\\D4]/",
"degraded": 0,
"failure": 0,
"protocol": "/http/",
"direction": "/.*/"
}
]
},
{
"namespace": "/beta/",
"kind": "/.*/",
"name": "/.*/",
"tolerance": [
{
"code": "/[4]\\d\\d/",
"degraded": 30,
"failure": 40,
"protocol": "/http/",
"direction": "/.*/"
},
{
"code": "/[5]\\d\\d/",
"degraded": 0,
"failure": 0,
"protocol": "/http/",
"direction": "/.*/"
}
]
},
{
"namespace": "/.*/",
"kind": "/.*/",
"name": "/.*/",
"tolerance": [
{
"code": "/^5\\d\\d$/",
"degraded": 0,
"failure": 10,
"protocol": "/http/",
"direction": "/.*/"
},
{
"code": "/^4\\d\\d$/",
"degraded": 10,
"failure": 20,
"protocol": "/http/",
"direction": "/.*/"
},
{
"code": "/^[1-9]$|^1[0-6]$/",
"degraded": 0,
"failure": 10,
"protocol": "/grpc/",
"direction": "/.*/"
}
]
}
]
}
}
What are we applying?
-
For namespace alpha, all resources
-
Protocol http if % requests with error code 404 are >= 10 then FAILURE, if they are > 0 then DEGRADED
-
Protocol http if % requests with others error codes are> 0 then FAILURE.
-
For namespace beta, all resources
-
Protocol http if % requests with error code 4xx are >= 40 then FAILURE, if they are >= 30 then DEGRADED
-
Protocol http if % requests with error code 5xx are > 0 then FAILURE
-
For other namespaces Kiali will apply the defaults.
-
Protocol http if % requests with error code 5xx are >= 20 then FAILURE, if they are >= 0.1 then DEGRADED
-
Protocol grpc if % requests with error code match /^[1-9]$|^1[0-6]$/ are >= 20 then FAILURE, if they are >= 0.1 then DEGRADED
Alpha | Beta |