/ monitoring

Zabbix Monitoring - Part 2 - Paging on call...

Now that Zabbix is up and monitoring your systems lets start thinking about how it's going to send it's alerts and some self healing capilities. If you haven't read part one, you can read up on it here. Let's face it, your engineer is human and a human's attention span can easily fail in under 5 mins (writing from experience!).

It's super tedious to watch the problems page in Zabbix 24/7 and wait for issues to pop up and then there multiple ways to correct an issue. Some are more efficient than others, so it's necesary to automate procedures and standardize actionable alerts.

Zabbix gives us the option to forward alerts to email, mobile text, and a plethora of integrations with 3rd party applications like Slack, Hipchat, and Pager Duty.

Actions
Actions are procedures that Zabbix can execute when an event with the corresponding severity level arises. Actions can either execute a custom script (python and bash) or do a Zabbix related task.

Create a Slack Channel and Webhook Integration
This is a highly valuable integration especially if your OPS teams use Slack religiously. No one can stare at Zabbix all day and wait for alerts to happen. Yes you can use traditional modes of messaging (emails and SMS) however these are cumbersome and take an additional step configuring. Usually you will need to add components to make the system work.

With Slack, all you need is the ability to send a curl command to the Slack webhook. You can create a script using bash or python since these come pre-installed in most RHEL and Debian distros. I will show both an example curl command and a python script.

Let's go here https://my.slack.com/services/new/incoming-webhook and review the official documentation from Slack themselves.

2018-07-28_21-00-28

Create an incoming webhook, this will allow Zabbix to forward alerts to Slack.
Your webhook should look something like mine below. (I added X's, since I don't want anyone sending therollbak a bunch of fake alerts. We love our sleep.)
https://hooks.slack.com/services/XXXXXX/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

To test your webhook you can execute a POST request using the curl command to your newly created webhook.

curl -X POST --data-urlencode "payload={\"channel\": \"#alerts\", \"username\": \"therollbak-bot\", \"text\": \"This is posted to #alerts and comes from a bot named therollbak-bot.\", \"icon_emoji\": \":alien:\"}" https://hooks.slack.com/services/XXXXXX/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

If the POST went through correctly you should see the following in your Slack channel.

2018-07-28_21-31-45

Creating Zabbix Actions
We will create a script in AlertScript directory( /lib/zabbix/alertscripts/). We will later configure Zabbix to use this alert script. Please note that Zabbix will execute the script with positonal arguments.This is important especially if you want Zabbix to alert to a specific team based on the severity level or do something else based on the hostname.

Actions are great since you can use Zabbix as a self healing tool. Say for instance a service is in a bad state. You can have Zabbix attempt to correct the service for you!

The positional arguements that I will be using; will send an alert to a specific channel, the subject of the message, and a brief message of the alert.

And just for kicks and because it's always good to keep my programming knowledge fresh, I worte it in python. You can also write it in bash if you prefer.

https://github.com/pafable/zabbix-slack.git

#!/usr/bin/env python
import requests
import sys

URL = 'https://hooks.slack.com/services/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX' #replace the X's with your actual webhook
UN = 'ZABBIX'
SLACK = ('#' + sys.argv[1])
SUBJECT = sys.argv[2]
MSG = sys.argv[3]

if SUBJECT == 'RECOVERY':
        EMOJI = ':heavy_check_mark:'
        COLOR = '#2de52d'
elif SUBJECT == 'OK':
        EMOJI = ':smile:'
        COLOR =  '#2de52d'
elif SUBJECT == 'PROBLEM':
        EMOJI = ':fire:'
        COLOR = '#ff0000'
else:
        EMOJI = ':ghost:'
        COLOR = '#ffee00'

data = { "channel": SLACK,
        "username": UN,
      "icon_emoji": EMOJI,
     "attachments": [{ "color": COLOR,
            "text": MSG }]
       }

x = requests.post(URL, json=data)

As you can see from above Zabbix will pass in three arguments; slack channel to send the alert, the subject of the message, and finally the message. Then it will make a POST request to the slack webhook.

Configuring Media Type and Actions
Now let's transistion to the the Zabbix UI and configure the actual actions.

First we will create a new media type, go to Administration > Media types and then click on the "Create media type" button on the top right.

Fill in the name (I named mine Zabix-Slack), choose Script for the type, name of the script on the server (mine is called slack.py).

For the Script parameters fill it in with these entries.

{ALERT.SENDTO}
{ALERT.SUBJECT}
{ALERT.MESSAGE}

This will tell Zabbix how to handle slack.py and what arguments to supply it.

2018-10-10_12-54-59

Almost done, now we can configure the Actions in the UI. Navigate to Configuration > Actions and click on the "Create action" button on the top right.

Fill in the name, this can be whatever you want (mine is named Zabbix-Slack). For type of calculation choose And. This is important since we want this action to only execute if certain conditions are met.

My conditions are: the host must not be in maintenance and trigger severity is greater than average.

2018-10-10_15-54-39

Click on the Operations tab and set the Default operation step duration to 1 minute. Default subject "This Host is Going Bananas!!!!" or you can put something generic like PROBLEM. :)

In the default message, enter the following:

Problem started at {EVENT.TIME} on {EVENT.DATE}
Problem name: {TRIGGER.NAME}
Host: {HOST.NAME}
Severity: {TRIGGER.SEVERITY}

Original problem ID: {EVENT.ID}
{TRIGGER.URL}

This will be displayed on the Slack channel.

In the operations add the Zabbix-Slack user and set the Send only to Zabbix-Slack

2018-10-10_15-56-25

In the Recovery operations add the following message to inform the poor soul who is on-call that the host has recovered and it is okay.

2018-10-10_15-58-53

Now let's test it! I turned off httpd on the Zabbix server and my alert worked!

Image-from-iOS

Image-from-iOS--1-

You don't have to style the message output like mine, I think this layout allows people to quickly assess the severity of an issue. If you do want to change the styling, edit the python script and replace the hex values with the colors and icons you want.

Zabbix Monitoring - Part 2 - Paging on call...
Share this