What is a Dead Man's Switch?

A dead man's switch is a monitoring pattern that alerts you when a process stops running. If your job doesn't "check in," you know something's wrong—even if there are no error messages.

The concept

The term "dead man's switch" comes from industrial machinery and trains, where an operator must continuously hold a lever or button. If they let go (become incapacitated), the machine stops or triggers an alarm.

In software monitoring, the concept is reversed: your process must actively signal that it's alive. If the signal stops, an alert fires. This is also called heartbeat monitoring, check-in monitoring, or cron monitoring.

How it works

  1. You configure a monitor expecting a "ping" every X minutes/hours
  2. Your scheduled job sends an HTTP request when it completes successfully
  3. If the monitor doesn't receive a ping within the expected window, it alerts you

Why dead man's switches matter

Silent failures

Many failures are silent. A cron job that doesn't run produces no errors—it just doesn't happen. Without active monitoring, you might not notice for days or weeks.

No network endpoint

Traditional HTTP monitoring checks if a URL responds. But cron jobs, background workers, and batch processes often have no URL to check. Heartbeat monitoring solves this.

Completion verification

You can verify a job not only ran, but completed successfully. Place the ping at the end of your script—if the job crashes partway through, no ping is sent.

Common use cases

Cron jobs

Scheduled tasks like database backups, report generation, data cleanup, and sync jobs. If cron stops running or a job fails, you need to know.

# Backup script with heartbeat
0 2 * * * /scripts/backup.sh && curl -fsS https://updog.watch/ping/TOKEN

Background workers

Queue processors (Sidekiq, Celery, etc.) that should continuously process jobs. If a worker dies, pending jobs pile up unprocessed.

Data pipelines

ETL jobs, data imports, and scheduled API syncs. These often run on schedules and failures can cause data staleness.

Scheduled Kubernetes jobs

Kubernetes CronJobs can fail silently if the cluster has issues or resource limits are hit. Heartbeat monitoring provides external verification.

Implementation patterns

Basic: Ping on success

#!/bin/bash
# Only ping if the job succeeds
python /app/daily_report.py && curl -fsS https://updog.watch/ping/TOKEN

Advanced: Start and finish pings

#!/bin/bash
# Ping when starting
curl -fsS https://updog.watch/ping/TOKEN/start

# Do the work
python /app/long_running_job.py

# Ping when done (only if successful)
if [ $? -eq 0 ]; then
  curl -fsS https://updog.watch/ping/TOKEN
fi

In application code

# Python example
import requests

def run_scheduled_task():
    try:
        # Do the work
        process_data()

        # Signal success
        requests.get("https://updog.watch/ping/TOKEN", timeout=10)
    except Exception as e:
        # Don't ping on failure - the missing ping triggers the alert
        log.error(f"Task failed: {e}")

Configuring grace periods

Set your expected check-in interval based on your schedule, plus a grace period for variability:

Job Schedule Expected Interval Suggested Grace
Every minute 1 minute +1-2 minutes
Every hour 1 hour +5-10 minutes
Daily 24 hours +30-60 minutes
Weekly 7 days +1-4 hours

FAQ

A dead man's switch in software is a monitoring pattern where a process must actively signal that it's alive. If the signal stops, an alert is triggered. It's commonly used to monitor cron jobs and background tasks.

Your process sends a "ping" (HTTP request) to a monitoring service at regular intervals. If the monitoring service doesn't receive a ping within the expected window, it alerts you that something is wrong.

Log checking requires someone to look. Dead man's switch monitoring proactively alerts you when a job fails to run or complete. You find out immediately instead of when someone notices a problem.

Related resources

Monitor your cron jobs

Set up dead man's switch monitoring in minutes. Get alerted when scheduled jobs fail to run.

Start Free
  • Simple HTTP ping API
  • Configurable grace periods
  • Slack, email, SMS alerts
  • No code changes required