Build a Webhook DLQ in Laravel (Replay Failed Events Safely)

Build a Webhook DLQ in Laravel

A dead letter queue for failed webhooks in laravel gives you a controlled recovery path when background processing keeps failing. SendPromptly retries delivery, but your app still needs a durable inbox, bounded processing retries, and a replay workflow.

In this guide, you will implement a minimal inbox + worker + DLQ pattern in Laravel, then test it end-to-end. You will also set retention and alerting rules so failures do not silently pile up.

If you already return 2xx quickly from your webhook endpoint, this pattern is the missing piece that turns “accepted” into “processed reliably.”

Why you need a DLQ even if SendPromptly retries

“Ack fast” implies you accept before full processing

Returning 200 quickly is correct for webhook reliability, but it means your endpoint confirms receipt before your domain logic is done. If your async worker fails after that 200, you still need a local recovery path.

See Webhook success criteria and retries for delivery behavior expectations.

Your app needs a place for poison / non-retryable events

Some events fail for reasons retries will not fix (invalid state transitions, missing dependencies, bad schema assumptions). Those are poison messages and should be moved out of the hot retry path into a DLQ.

Mini incident: A team acknowledged webhooks immediately but had no DLQ. A serialization bug caused every worker attempt to fail, and events were effectively lost until they rebuilt payload history from logs.

Reference architecture

Webhook inbox table (source-of-truth)

Store every accepted webhook in webhook_inbox. This table is your forensic record, replay source, and dedupe boundary.

Processing job with bounded retries

Queue a job per dedupe key, cap retries, and update attempt/error state on each failure. This prevents poison events from burning compute forever.

DLQ table + admin replay

After retry exhaustion, copy the event into webhook_dlq with failure context. Replay from an admin workflow only after root cause is fixed.

Suggested diagram/visual: Show SendPromptly -> /webhooks/sendpromptly -> webhook_inbox -> queue worker -> processed OR webhook_dlq -> admin replay -> queue worker.

Use the Sample Project: generate a real webhook, then confirm your inbox rows fill up as deliveries arrive.

Laravel implementation (minimal)

Migrations

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
// php artisan make:migration create_webhook_inbox_tables
Schema::create('webhook_inbox', function ($table) {
    $table->id();
    $table->string('source')->default('sendpromptly');
    $table->string('dedupe_key')->unique(); // e.g. sha256(raw body)
    $table->json('payload');
    $table->timestamp('received_at');
    $table->timestamp('processed_at')->nullable();
    $table->string('status')->default('pending'); // pending|processing|processed|failed
    $table->unsignedInteger('attempts')->default(0);
    $table->text('last_error')->nullable();
    $table->timestamps();
});

Schema::create('webhook_dlq', function ($table) {
    $table->id();
    $table->string('source')->default('sendpromptly');
    $table->string('dedupe_key');
    $table->json('payload');
    $table->unsignedInteger('failed_attempts');
    $table->text('failure_reason');
    $table->timestamp('dead_lettered_at');
    $table->timestamps();

    $table->index(['source', 'dedupe_key']);
});

Controller writes inbox row + queues job

Assumption: signature verification runs before this route (for example middleware) and compares signatures with constant-time hash_equals; do not log raw signing secrets.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
use Illuminate\Http\Request;
use Illuminate\Support\Facades\DB;

Route::post('/webhooks/sendpromptly', function (Request $request) {
    $raw = $request->getContent();
    $dedupeKey = hash('sha256', $raw);

    $id = DB::table('webhook_inbox')->updateOrInsert(
        ['dedupe_key' => $dedupeKey],
        [
            'source' => 'sendpromptly',
            'payload' => json_decode($raw, true),
            'received_at' => now(),
            'status' => 'pending',
            'updated_at' => now(),
            'created_at' => now(),
        ]
    );

    dispatch(new \App\Jobs\ProcessWebhookInbox($dedupeKey));

    return response()->json(['accepted' => true], 200);
});

Job moves to DLQ on repeated failures

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
class ProcessWebhookInbox implements \Illuminate\Contracts\Queue\ShouldQueue
{
    public int $tries = 5;

    public function __construct(private string $dedupeKey) {}

    public function handle(): void
    {
        $row = \DB::table('webhook_inbox')->where('dedupe_key', $this->dedupeKey)->first();
        if (!$row || $row->status === 'processed') return;

        \DB::table('webhook_inbox')->where('dedupe_key', $this->dedupeKey)->update([
            'status' => 'processing',
            'attempts' => \DB::raw('attempts + 1'),
            'updated_at' => now(),
        ]);

        try {
            // TODO: your domain logic here
            // throw new \RuntimeException("simulate poison event");

            \DB::table('webhook_inbox')->where('dedupe_key', $this->dedupeKey)->update([
                'status' => 'processed',
                'processed_at' => now(),
                'updated_at' => now(),
            ]);
        } catch (\Throwable $e) {
            \DB::table('webhook_inbox')->where('dedupe_key', $this->dedupeKey)->update([
                'status' => 'failed',
                'last_error' => $e->getMessage(),
                'updated_at' => now(),
            ]);

            throw $e; // lets queue retry
        }
    }

    public function failed(\Throwable $e): void
    {
        $row = \DB::table('webhook_inbox')->where('dedupe_key', $this->dedupeKey)->first();
        if (!$row) return;

        \DB::table('webhook_dlq')->insert([
            'source' => 'sendpromptly',
            'dedupe_key' => $this->dedupeKey,
            'payload' => $row->payload,
            'failed_attempts' => $row->attempts,
            'failure_reason' => $row->last_error ?? $e->getMessage(),
            'dead_lettered_at' => now(),
            'created_at' => now(),
            'updated_at' => now(),
        ]);
    }
}

Test steps:

  1. Temporarily force the job to throw (uncomment throw new RuntimeException(...)).
  2. Send a webhook request to your endpoint (with valid signature headers).
1
2
3
4
5
curl -i -X POST "http://localhost:8000/webhooks/sendpromptly" \
  -H "Content-Type: application/json" \
  -H "X-SP-Timestamp: 1700000000" \
  -H "X-SP-Signature: <valid_signature_here>" \
  --data '{"event_key":"order.created","payload":{"order_id":"O-1001"}}'

Expected response: still 200 (accepted), while the job retries in the background and eventually inserts into webhook_dlq.

Common gotcha: teams return 200 and assume success, but never verify background completion. Track processed_at, attempts, and DLQ inserts as first-class operational signals.

Replay safely (idempotency)

Use unique event key (delivery id / hash)

Prefer X-SP-Message-Id (delivered in webhook headers) as the stable dedupe key when present; otherwise use a body hash. Your replay path should target the same key, not create a new identity.

See Idempotency patterns for safe replay.

Replays must be side-effect safe

For dlq replay failed webhooks safely idempotent, your handler must make duplicate-safe writes (upserts, uniqueness constraints, state guards).

Replay strategyResultRisk
Same dedupe key + idempotent writeSafe reprocessingLow
New key per replayDuplicate side effectsHigh
No uniqueness/state guardNon-deterministic outcomesHigh

Retention + operational checklist

How long to keep inbox and DLQ rows

For webhook dlq retention period how long to keep failed events, a practical baseline is:

  • Inbox: 7-30 days for normal forensics and replay.
  • DLQ: 30-90 days for incident windows and slower root-cause fixes.
  • Extend retention for compliance-heavy domains.

Also keep links handy for fast diagnosis: Interpreting failures (429, 401, 409, etc.) and Generate real traffic with a test event.

Alerts on DLQ growth

Alert on both absolute DLQ count and growth rate, and page when growth is sustained over multiple intervals.

Troubleshooting checklist for webhook delivery stuck in retry loop troubleshooting:

  • Confirm retries are bounded ($tries and queue retry/backoff settings).
  • Check whether root cause changed before replaying (config, schema, credentials).
  • Verify dedupe key is stable across initial delivery and replay.
  • Inspect latest last_error and classify retryable vs non-retryable.
  • Replay a single known event first, then expand in small batches.

Common failure modes

  1. No inbox table -> you lose payloads if async processing fails after 2xx.
  2. No dedupe key -> DLQ replay causes duplicate side effects.
  3. Unbounded retries -> poison message burns compute forever.
  4. DLQ without replay tooling -> failures accumulate with no recovery path.
  5. Storing only partial payload -> you can’t diagnose why it failed.
  6. Replaying without fixing root cause -> immediate re-failure loop.

Conclusion

  • A fast 200 webhook response is only step one; durable inbox processing is the reliability boundary.
  • Bounded retries protect workers from poison events and push unresolved failures into a recoverable DLQ.
  • Replay must reuse the same dedupe identity and idempotent write path to avoid duplicate side effects.
  • Retention and alerts are operational controls, not optional cleanup tasks.
  • You should be able to trace any event from receipt to processed or dead-lettered state.

Recover in minutes: Open Message Log to find the failing run, fix the cause, then Replay from your DLQ safely.