We will want to monitor liberty, to make sure everything is working properly, and have it alert us if that's not the case.

Internal monitoring

Both the hardware and the software have the capability to check and report on various conditions. We should make use of this, and send mail if things are out-of-range. Things to monitor this way might include:

  • Load average
  • Free disk space
  • Temperature and fan speed
  • Hard disk health (SMART)
  • RAID status
  • Unusual firewall/service activity

External monitoring

We also want some monitoring done externally -- that is, running on computers other than liberty. For one, to really be sure everything is working, you need to check things from off the system. More importantly, a really serious problem (kernel crash, power loss, network down, etc.) will prevent liberty from notifing us in the first place.

Things we might want to monitor this way:

  • ping response
  • TCP response (simple "Can I connect?" probes)
  • SMTP banner (connect, make sure you get the proper identification)
  • HTTP page request (make sure we can fetch some known URL)
  • SMTP mail flow (will mail forwarded through the system make it to the other end?)

Monitoring hosts

Anyone doing any external monitoring, please record your IP address and host name, and what you're monitoring:

  • 192.0.2.69 - somebox.example.com
    • ping
    • TCP probes of HTTP, SSH, SMTP
    • requesting www.gnhlug.org home page

  • Bill's Intermapper on adelphia
    • ping
    • TCP probes of HTTP, SSH, SMTP, HTTPS, DNS

  • Cole's Nagios from 64.34.179.90 and 64.34.182.198 (approx every 15 minutes)
    • ping
    • TCP probes of HTTP
    • (more to come)
    • -- ColeTuininga - 01 May 2006

Comments

Panic? ;-) Seriously , I would expect the person to trouble-shoot it if they can, and notify someone who can if they can't. Use the -sysadmin list to keep everyone informed. The one issue I see is problems requiring either physical access or MV assistance; other than calling you, Bruce, our options are limiting there.

-- BenScott - 21 May 2006

What should happen when one of us discovers something wrong?

-- BruceDawson - 02 May 2006

Edit | Attach | Watch | Print version | History: r5 < r4 < r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r5 - 2006-05-20 - BenScott
 

All content is Copyright © 1999-2024 by, and the property of, the contributing authors.
Questions, comments, or concerns? Contact GNHLUG.
All use of this site subject to our Legal Notice (includes Terms of Service).