What is a Dead Man's Switch?
A dead man's switch is a monitoring pattern that alerts you when a process stops running. If your job doesn't "check in," you know something's wrong—even if there are no error messages.
The concept
The term "dead man's switch" comes from industrial machinery and trains, where an operator must continuously hold a lever or button. If they let go (become incapacitated), the machine stops or triggers an alarm.
In software monitoring, the concept is reversed: your process must actively signal that it's alive. If the signal stops, an alert fires. This is also called heartbeat monitoring, check-in monitoring, or cron monitoring.
How it works
- You configure a monitor expecting a "ping" every X minutes/hours
- Your scheduled job sends an HTTP request when it completes successfully
- If the monitor doesn't receive a ping within the expected window, it alerts you
Why dead man's switches matter
Silent failures
Many failures are silent. A cron job that doesn't run produces no errors—it just doesn't happen. Without active monitoring, you might not notice for days or weeks.
No network endpoint
Traditional HTTP monitoring checks if a URL responds. But cron jobs, background workers, and batch processes often have no URL to check. Heartbeat monitoring solves this.
Completion verification
You can verify a job not only ran, but completed successfully. Place the ping at the end of your script—if the job crashes partway through, no ping is sent.
Common use cases
Cron jobs
Scheduled tasks like database backups, report generation, data cleanup, and sync jobs. If cron stops running or a job fails, you need to know.
# Backup script with heartbeat
0 2 * * * /scripts/backup.sh && curl -fsS https://updog.watch/ping/TOKEN
Background workers
Queue processors (Sidekiq, Celery, etc.) that should continuously process jobs. If a worker dies, pending jobs pile up unprocessed.
Data pipelines
ETL jobs, data imports, and scheduled API syncs. These often run on schedules and failures can cause data staleness.
Scheduled Kubernetes jobs
Kubernetes CronJobs can fail silently if the cluster has issues or resource limits are hit. Heartbeat monitoring provides external verification.
Implementation patterns
Basic: Ping on success
#!/bin/bash
# Only ping if the job succeeds
python /app/daily_report.py && curl -fsS https://updog.watch/ping/TOKEN
Advanced: Start and finish pings
#!/bin/bash
# Ping when starting
curl -fsS https://updog.watch/ping/TOKEN/start
# Do the work
python /app/long_running_job.py
# Ping when done (only if successful)
if [ $? -eq 0 ]; then
curl -fsS https://updog.watch/ping/TOKEN
fi
In application code
# Python example
import requests
def run_scheduled_task():
try:
# Do the work
process_data()
# Signal success
requests.get("https://updog.watch/ping/TOKEN", timeout=10)
except Exception as e:
# Don't ping on failure - the missing ping triggers the alert
log.error(f"Task failed: {e}")
Configuring grace periods
Set your expected check-in interval based on your schedule, plus a grace period for variability:
| Job Schedule | Expected Interval | Suggested Grace |
|---|---|---|
| Every minute | 1 minute | +1-2 minutes |
| Every hour | 1 hour | +5-10 minutes |
| Daily | 24 hours | +30-60 minutes |
| Weekly | 7 days | +1-4 hours |
FAQ
Related resources
Monitor your cron jobs
Set up dead man's switch monitoring in minutes. Get alerted when scheduled jobs fail to run.
Start Free- Simple HTTP ping API
- Configurable grace periods
- Slack, email, SMS alerts
- No code changes required