Elevated Platform Error Rates

Incident Report for WorkOS

Postmortem

Summary

On Wednesday, July 16th, from 20:38 to 21:06 UTC, several WorkOS services were unavailable for some customers. Impacted services included AuthKit, Dashboard, Admin Portal, SSO, Directory Sync, and their APIs. The outage lasted 28 minutes.

Customers can opt to use custom domains to serve WorkOS products as part of a branded experience. The incident occurred after a code change introduced a bug in the edge worker responsible for custom domain routing.

We identified the issue and rolled back the affected edge worker deployment. We recognize the seriousness of this outage and are committed to maintaining the highest level of reliability across our platform.

Root Cause Analysis

WorkOS supports custom domains across several products, including AuthKit, SSO, Admin Portal, and our APIs, to enable customers to deliver a branded experience. We use edge workers to handle routing logic for these domains.

A recent code change introduced a bug in the edge worker that caused it to throw errors during request handling. This bug was not caught by our test suite due to environment differences between the V8 runtime used by our edge workers and the Node.js environment used in tests. The test suite did not replicate the behavior of the V8 runtime closely enough to surface this issue. Differences in how custom domains are configured in our pre-production environments prevented automated health checks from detecting the issue.

Remediation

Edge workers operate in the critical path of our services. In response to this incident, we are taking immediate steps to strengthen our testing and deployment safeguards for edge infrastructure. This includes adding automated health checks for custom domain routes in pre-production and tightening our rollout procedures for edge worker changes.

Posted Jul 16, 2025 - 19:39 EDT

Resolved

All services are operational and the incident has been resolved.
Posted Jul 16, 2025 - 18:45 EDT

Monitoring

We’ve rolled out the fix and are continuing to monitor for errors.
Posted Jul 16, 2025 - 17:17 EDT

Identified

Our team has identified the issue and is rolling out the fix.
Posted Jul 16, 2025 - 17:08 EDT

Update

We are seeing elevated error rates across our platform. Our team is investigating the issue.
Posted Jul 16, 2025 - 17:00 EDT

Investigating

We are seeing elevated error rates in AuthKit. Our team is investigating the issue.
Posted Jul 16, 2025 - 16:51 EDT
This incident affected: Supporting Services (Dashboard, Admin Portal) and Core Services (SSO, Directory Sync, AuthKit).