Skip to main content

The boring half of bug triage is a cron job now

00:06:59

Our backend isn't one server. The same Laravel codebase runs as several services: an API, a queue worker, a sync service for callbacks, an MQTT ingester, an admin panel. Each deploys separately and fails differently. Every one of them streams its errors into its own Sentry project. Multiply that across a handful of Laravel products and you're looking at ten-plus error streams and hundreds of events a day.

Nobody has time to read that. And the errors that actually matter are the ones that hide: a bug that fires three times a day, buried under a couple hundred noisier ones, is invisible unless someone is actively triaging. So it goes unnoticed for days, gets re-reported across Slack and the tracker by three different people, and by the time anyone opens it, users have been hitting it all week.

The triage is boring, constant, and doesn't scale with the team: scanning, spotting the real recurring bugs, filing them, starting a fix. The review is where judgment actually lives: deciding whether a fix is right. So I automated the first half and kept the second.

The shape of it

Once an hour, a GitHub Action scans Sentry across every configured service. For each error that clears a threshold, it files a GitHub issue with the full context: error type, message, culprit, occurrence count, users affected, and a link back to Sentry. An AI coding agent (GitHub Copilot) is assigned to that issue, reads the context, explores the code, and opens a draft pull request. Then a human reviews: merge it, push back on it, or close it and write the fix by hand using the agent's analysis as a head start.

No person is in the loop until the review. That's the whole design: the machine does the finding and the first draft; the engineer does the deciding.

The threshold isn't one number

The obvious filter is "file an issue for errors that happen a lot." But "a lot" isn't one number across a fleet. The config is per-service, and the interesting part is the overrides:

yaml
defaults:
  min_event_count: 5        # ignore errors that fired fewer times (lifetime)
  min_users_affected: 1     # ignore errors with no associated user
  base_labels: [auto-bug, sentry]

projects:
  - name: api
    sentry_project: <api-project>
    extra_labels: [service:api]

  - name: worker
    sentry_project: <worker-project>
    extra_labels: [service:worker]
    min_users_affected: 0     # a queue worker's errors have no user, so requiring one filters out everything

  - name: admin
    sentry_project: <admin-project>
    extra_labels: [service:admin]
    min_event_count: 2        # low-traffic but high-stakes (refunds, billing): catch it before the default 5

  - name: mqtt
    sentry_project: <mqtt-project>
    extra_labels: [service:mqtt]
    min_users_affected: 0     # an MQTT ingester has no user, so requiring one filters out everything

  - name: sync
    sentry_project: <sync-project>
    extra_labels: [service:sync]
    min_users_affected: 0     # a sync service's errors have no user, so requiring one filters out everything

Two overrides there earned their keep. The default (at least 5 occurrences and at least 1 affected user) describes a real, recurring, user-facing bug. But a queue worker or an MQTT ingester has no user attached to its errors, so requiring one would filter out every single one; those services set the user threshold to zero. And the admin panel is low-traffic but high-stakes. It's where refunds and billing edits happen, so it drops the occurrence bar to 2, to catch a bug there before it reaches the default. The right threshold is a property of what the service does, not a global constant.

It kept filing the same bug twice

The first version created duplicate issues on almost every hourly run, and the cause was a wrong assumption about GitHub's API. To avoid re-filing a bug, each issue embeds a marker, the Sentry short-id, and before creating anything the scan checks whether that id already exists. I did that check with GitHub's Search API. Search has an indexing lag of minutes to hours, so the issue I'd filed an hour ago wasn't searchable yet, the check came back empty, and I filed it again.

The fix was to stop searching and start listing:

python
# Dedup on a Sentry-ID marker embedded in each issue body. Use the List API,
# NOT Search: Search has minutes-to-hours indexing lag, so an issue filed an
# hour ago isn't searchable yet, the check returns empty, and you file it twice.
def load_seen_sentry_ids() -> set[str]:
    seen = set()
    for page in range(1, MAX_PAGES + 1):
        url = (f"https://api.github.com/repos/{REPO}/issues"
               f"?state=all&labels=auto-bug&per_page=100&page={page}")
        status, data = http("GET", url, gh_headers)
        if status != 200 or not data:
            break
        for item in data:
            if item.get("pull_request"):        # the List API returns PRs too, so skip them
                continue
            match = SENTRY_ID_RE.search(item.get("body") or "")
            if match:
                seen.add(match.group(1))
        if len(data) < 100:
            break
    return seen

Pull every auto-bug issue, open and closed, straight from the List API (no index, no lag) and build the set of seen ids in memory. The List API is "dumber" than Search, and that's exactly why it's right here: it returns what's true now, not what's been indexed since.

The 16-minute fix that should have been 5

The first real fix the agent attempted took sixteen minutes, and almost none of it was thinking. Copilot's coding agent runs in a sandbox that turns on an outbound network firewall before it starts working. Our app pulls Composer packages (some from Packagist, one hosted on GitLab) and npm packages; with the firewall up, every download was blocked, retried, and blocked again. Sixteen minutes of a robot losing a fight with a firewall.

The fix is a workflow with a very specific name:

yaml
# Copilot's sandbox enables an outbound firewall before it starts. Anything it
# needs from the network must be installed BEFORE that, which is what this does.
# Copilot looks for a job named EXACTLY `copilot-setup-steps`; rename it and this
# silently stops working. The name is the API.
jobs:
  copilot-setup-steps:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: shivammathur/setup-php@v2
        with: { php-version: '8.3', tools: composer:v2 }
      - run: composer install --no-interaction --prefer-dist --optimize-autoloader
      - uses: actions/setup-node@v4
        with: { node-version: '20', cache: 'npm' }
      - run: npm ci

Copilot runs that job before the firewall comes up, so every dependency downloads while the network is still open. By the time the agent starts working, firewall on, everything it needs is already on disk. Sixteen minutes down to about five, and nothing about the sandbox's security changed: at run time the agent still can't reach the network. The gotcha is that job name. Rename it and the whole optimization silently disappears. No error, just slow again.

Making it cost nothing

Two things nearly killed it before it shipped. The first plan powered the fix attempts with a paid LLM API, a per-project monthly cost that would grow with every project we added. Moving the fix step to the Copilot coding agent, already part of our GitHub subscription, took that to zero marginal cost. And the first host we tried capped scheduled automation at fifteen runs a day; hourly checks across several projects would blow through that by lunchtime. Putting the whole pipeline on GitHub Actions (no meaningful run cap, free at our volume) removed the ceiling. The most expensive parts of this system are things we were already paying for.

Why it's allowed near production

An AI agent writing code against your production repo should make you a little nervous, so the guardrails are the point, not a footnote. The agent never merges. It opens draft PRs, and a human reviews every one before it reaches a branch that deploys. It runs in an isolated sandbox with no access to production systems, databases, or secrets, and every action it takes is logged. If a fix is wrong you close it; if it's half-right you take its analysis and finish the job yourself. The pipeline removes the toil of finding and starting. It doesn't touch the judgment of deciding.

I've built a few things on this shape now: agents that hand you a diff to approve rather than an action they take on their own. It's the only version of "let an AI near the codebase" I actually trust.

The point

Adding the next project is a config edit, not a rebuild: drop in the workflow files, list the service's Sentry projects, add two secrets, and it starts triaging within the hour.

bash
gh secret set SENTRY_AUTH_TOKEN --body "$SENTRY_TOKEN"
gh secret set FLEET_GH_PAT      --body "$GH_PAT"

That's deliberate: configuration over code, so it scales across a fleet without becoming a fleet of bespoke scripts. And the aim was never to replace the engineer. It was to delete the part of the job that's just scanning logs, so that it actually gets done, and hand the saved hours back to the part that needs a person. The boring half is a cron job now. The rest is still ours.

The whole pipeline is open source, workflows and config and all: github.com/NarimanGardi/auto-bug-triage.