YOUR WORKERS KEEP DYING AND YOU KNOW IT

Just Fucking Use

Trigger.dev

Every backend dev has that one job that randomly stops working. You check the logs. Nothing. You restart it. It works. Until it doesn't. Again.

We've all been there.

It's Monday morning. You open Slack. "Hey, did the report generation run last night?" Spoiler: it didn't. The EC2 instance ran out of memory at 3 AM and nobody noticed because your "alerting" is just you checking CloudWatch when you remember.

So you SSH into the box. You run pm2 logs. You see the same error you saw last month. You fix it with the same hacky workaround. You tell yourself you'll "do it right" next sprint.

Next sprint never comes. The technical debt compounds. And somehow you're the one on-call for a system you barely understand anymore.

monday-morning.log

Sound familiar? This is what "battle-tested infrastructure" looks like at most startups.

monday-morning.log
1[slack] @channel did the PDF exports run?
2[you] checking...
3[you] shit
4 
5$ ssh prod-worker-01
6$ pm2 logs export-service --lines 500
7[ERROR] heap out of memory
8[ERROR] heap out of memory
9[ERROR] heap out of memory
10# ...silence for 6 hours...
11 
12$ pm2 restart export-service
13[PM2] Restarting...
14 
15# "Fixed"
16 
17[slack] @channel it's happening again
18$ pm2 restart export-service
19# repeat until you quit or the company dies

Your current "architecture" probably includes:

  • A PM2 process that restarts itself into oblivion
  • setTimeout calls pretending to be a scheduler
  • A Redis instance you're scared to look at
  • AWS Lambda functions timing out after 15 minutes
  • A Notion doc called "How to restart the workers" that's already outdated

You're not running infrastructure. You're babysitting chaos.

Enter Trigger.dev

Trigger.dev is a TypeScript-first background jobs platform. You define tasks in code. You get retries, logging, and monitoring without configuring anything. It runs serverlessly so you don't manage VMs, containers, or "worker pools."

It handles the stuff that doesn't belong in your API routes:

  • Generating PDFs and reports
  • Sending transactional emails
  • Processing uploads and media
  • Syncing data with third-party APIs
  • Running AI pipelines that take minutes, not milliseconds
  • Scheduled jobs that actually run when they're supposed to

No timeouts. No cold starts killing your long tasks. No praying that your cron job didn't silently fail.

It's just TypeScript.

No YAML. No DSLs. No "infrastructure as code" that requires a PhD to debug.

One command to start

Runs your tasks locally with full observability. Click the link it gives you to see exactly what's happening.

1npx trigger.dev@latest dev

Define a task

It's an async function. That's it. Trigger.dev handles the queue, the retries, and the "why did this fail" dashboard.

trigger/generateReport.ts
1import { task } from "@trigger.dev/sdk/v3";
2import { generatePDF } from "../lib/pdf";
3import { sendEmail } from "../lib/email";
4 
5export const generateReport = task({
6 id: "generate-report",
7 retry: { maxAttempts: 3 },
8 run: async ({ userId, reportType }: { userId: string; reportType: string }) => {
9 const data = await fetchReportData(userId, reportType);
10 const pdf = await generatePDF(data);
11 await sendEmail({ to: data.email, attachment: pdf });
12 return { success: true, pages: pdf.pageCount };
13 },
14});

What you're doing now

  • Wrapping everything in try-catch and hoping for the best
  • Using node-cron in a long-running process
  • Spinning up BullMQ + Redis + a worker dyno
  • Hitting Lambda's 15-minute timeout on legitimate workloads
  • Debugging by adding more console.logs
  • Finding out jobs failed when customers complain

What Trigger.dev gives you

  • Automatic retries with exponential backoff
  • Built-in queues with concurrency control
  • Real cron that survives deploys and restarts
  • No timeouts — run for hours if you need to
  • Full observability — see every run, every log, every error
  • Alerts before your users notice something's wrong

Background jobs should be boring. Trigger.dev makes them boring.

"My workflow is complex."

No it isn't. You're doing some variation of:

  1. Receive a trigger (webhook, schedule, user action)
  2. Fetch some data
  3. Do something with it (transform, call an API, generate a file)
  4. Store the result or notify someone
  5. Handle failures gracefully

That's literally what Trigger.dev is designed for. Stop reinventing orchestration.

"I don't want vendor lock-in."

Your current stack has more lock-in than you think:

  • The undocumented bash scripts in your deploy pipeline
  • The Redis instance with 47 different key patterns
  • The "simple" worker that's now 3000 lines
  • The monitoring setup held together with Datadog queries and hope

Trigger.dev is open source. Your tasks are TypeScript functions. You can self-host if you want to run your own infrastructure. (But why would you?)

"It costs money."

There's a free tier. But let's talk about what you're paying now:

  • Engineer hours debugging silent failures
  • Revenue lost when batch jobs don't complete
  • Customer trust eroded by "sorry, it didn't send"
  • The EC2 instance running 24/7 "just in case"
  • Your sanity at 2 AM

Paying for reliability isn't an expense. It's buying back your weekends.

Use Trigger.dev if:

  • → You have background work that keeps breaking and you're tired of fixing it
  • → Your Lambda functions keep timing out on legitimate workloads
  • → You need scheduled jobs that actually run (and tell you when they don't)
  • → You're building AI features that take more than 30 seconds
  • → You want to see what your jobs are doing without SSHing into a box

You know your current setup sucks.

Stop pretending setTimeout is a queue.
Stop treating PM2 restarts as a monitoring strategy.
Stop being the human retry mechanism for your own infrastructure.

Just fucking use Trigger.dev.