WDC is a client application for WatchDog - wd, which is a tool to monitor and health check servers through plug-in scripts.
WDC listens to a remote alert server which broadcasts alert messages. The received alert will be logged to WDC/logs/ in user's home directory. It also comes with a local http server listening on port 8080 with an endpoint /{id} that can be used to get details of an alert given it's id. The same can be done via command line using the inspect command of wdc.
go install github.com/opxyc/wd-client@latest
Usage:
wdc [command]
Available Commands:
help Help about any command
inspect Inspect an alert using it's ID
listen Start listening to incoming alerts
$ wdc listen
But default, wdc connects to localhost:40080 at endpoint /ws/connect.
Usage:
wdc listen [flags]
Flags:
-r, --addr string network address of alert server (default "localhost:40080")
-e, --end-point string websocket connection endpoint of alert server (default "/ws/connect")
$ wdc inspect ID
be@mC:~/Desktop/wdc$ wdc inspect PgKt664SLUxceGiuFhTGMn
ID PgKt664SLUxceGiuFhTGMn
Time 2021-Oct-27 23:31:03
Hostname dr-pg-l02
Task Name mount-point-arc-check
Message Mount point /arc > 90%
Status OK (actions executed successfully)
Command output:
(mount-point-arc-check.out) current utilization 91.2%
(mount-point-arc-check.delete-archives.out) deleting archives
(mount-point-arc-check.delete-archives.out) current utilization 63.8%
You can also get the same info by visiting localhost:8080/{id}.
WDC logs into a new log file every day and hence inside UserHome/WDC/logs, there will be log files with name in the format yyyy-month-dd. Each alert is logged in the below format (with example).
// 2021/90/27 18:37:45
// rJeqKsiMECBUpCGG54YhgL 👈 log id
// 2021-Oct-27 13:40:04 👈 the time alert was generated
// mC 👈 hostname
// cpu-usage-gt-90 👈 task name
// cpu usage on > 90%. take action immediately 👈 short msg
// (cpu-usage-gt-90.err) exit status 1 👈 long msg (combined output of the
// (cpu-usage-gt-90.out) current CPU usage is 15 cmd executed at wd client running on backend server)
// (cpu-usage-gt-90.stop-backup.err) exec: "bkpc stop":
// could not stop backup process; unresponding thread
// 1 👈 status code: 0 or 1
// ENDOFrJeqKsiMECBUpCGG54YhgL 👈 end of one alertIf the terms taskname, message, cmd etc. is not clear, head to WD.
