How to Monitor Stripe Webhooks in Production: What Your Logs Don't Tell You

SendPromptly
6 min read

Most SaaS teams think they are monitoring their Stripe webhooks because they have access to the Stripe dashboard and their application logs. They are monitoring delivery. They are not monitoring fulfillment.

This distinction matters a great deal when something goes wrong at 2 AM on a Friday.

Delivery vs. Fulfillment

Delivery is what Stripe tracks: did your endpoint receive the event and return a 2xx response?

Fulfillment is what your business cares about: did your app grant the access, add the credits, update the subscription state?

These are not the same thing. A 200 response from your endpoint tells Stripe the event was received. It says nothing about what happened afterward — whether the job was enqueued, whether the job ran, whether the database was updated.

The gap between delivery and fulfillment is where silent failures live. Your Stripe dashboard looks healthy. Your webhook endpoint is returning 200. But somewhere in your async processing pipeline, the job is stuck, crashed, or silently skipped.

What Stripe Dashboard Monitoring Covers

The Stripe webhook dashboard gives you:

  • Delivery status per event (delivered, failed, pending)
  • Response code from your endpoint
  • Retry history when delivery fails
  • The ability to resend individual events

This is useful for diagnosing delivery failures — endpoint misconfigurations, signature verification errors, network issues. It is not useful for diagnosing fulfillment failures, because fulfillment happens after the 200 response.

Layer 1: Application Log Monitoring

Your application logs are the first place to add visibility. At minimum, your webhook handler should log:

  • Event received (with event ID and type)
  • Job enqueued (with job ID and event ID)
  • Job started (with job ID)
  • Business effect applied (with customer ID, effect type, and relevant identifiers)

This creates an observable chain. If any link in the chain is missing from the logs, you know where the break occurred.

What to alert on:

  • job enqueued without a subsequent job started within your expected processing window
  • job started without a subsequent business effect applied within your expected completion window
  • Any exception in the job class handling payment events

Many teams have the first two log lines but not the last — they log receipt and enqueue but never log the outcome. Without the outcome log line, you cannot distinguish a working handler from a silently failing one.

Layer 2: Queue Worker Monitoring

If your webhook handler processes events asynchronously (the correct pattern), your queue workers are a critical point of failure that requires its own monitoring.

Metrics to monitor:

MetricWhat it tells youAlert threshold
Worker process countWhether workers are running0 = immediately alert
Queue depthBacklog accumulationDepends on throughput; alert on sustained growth
Job failure rateExceptions in job processingAny increase from baseline
Dead letter queue depthJobs that exhausted retriesAny increase = investigate
Job processing latencyTime from enqueue to executionAlert if significantly above normal

The dead letter queue depth alert is the most important one that most teams miss. Jobs move to the dead letter queue silently — no notification, no alert. A growing dead letter queue means jobs are failing repeatedly and your customers are affected.

Layer 3: Database State Verification

The most reliable way to monitor fulfillment is to periodically verify that your database state reflects what Stripe says should have happened.

For subscription SaaS:

  • Customers with active Stripe subscriptions should have active plans in your app
  • Customers who cancelled in Stripe should have non-active plans in your app

For credit products:

  • Recent invoice payments should have corresponding credit grant records
  • Credit balance changes should correlate with invoice amounts

A scheduled job that queries both Stripe and your database to find discrepancies is a powerful detection tool. It will not catch failures in real time, but it will catch them before the customer notices if you run it frequently enough.

For a deeper walkthrough of this approach, see the Stripe payment reconciliation guide.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# Example reconciliation check (pseudocode)
def reconcile_active_subscriptions():
    stripe_active = stripe.Subscription.list(status='active')
    
    for subscription in stripe_active:
        user = User.find_by_stripe_customer_id(subscription.customer)
        
        if user is None:
            alert(f"Stripe customer {subscription.customer} has no matching user")
            continue
            
        if user.plan != subscription.plan_id:
            alert(f"User {user.id} plan mismatch: app={user.plan}, stripe={subscription.plan_id}")

Run this on a schedule. Route alerts to your team’s incident channel. Treat each mismatch as an incident until resolved.

Layer 4: Payment-to-Outcome Timing Monitoring

A more targeted approach is to measure the time between a payment event and the expected business outcome, and alert when that window is exceeded.

The pattern:

  1. When your handler receives a payment event, record a “pending outcome” entry (event ID, event type, timestamp, expected outcome type)
  2. When your app applies the business effect, record a “completed outcome” entry (event ID, outcome type, completion timestamp)
  3. A monitoring job periodically scans for pending outcomes that are older than your expected processing window and have no corresponding completed entry — these are incidents

This approach catches fulfillment failures without depending on job-level monitoring, and catches them close to real time rather than waiting for a reconciliation batch.

What a Minimal Production Monitoring Setup Looks Like

For a small SaaS team with limited engineering time, a minimal setup that catches most fulfillment failures:

  1. Dead letter queue alerting — configure your queue to alert (email, Slack, PagerDuty) when a job moves to the dead letter queue. This is a 15-minute setup and catches the most common silent failure pattern.

  2. Worker health check — a simple endpoint that confirms at least one queue worker is running. Monitor with an uptime tool and alert if it fails.

  3. Weekly reconciliation run — a script that compares Stripe subscription and invoice data against your database. Run it manually at first, then automate it. It will find discrepancies you did not know existed.

  4. Application log alerting — configure your log aggregation tool to alert when your webhook handler logs an exception or when your credit grant job class logs a failure.

This setup requires no new infrastructure and will catch the majority of fulfillment failures before customers report them.


The gap between Stripe delivery and application fulfillment is exactly what SendPromptly instruments. It monitors the expected outcome for each payment event and opens an incident when the business effect does not complete within your configured window — giving you real-time detection without building monitoring infrastructure from scratch. Learn how the monitoring works →