On Wednesday, July 16th, from 20:38 to 21:06 UTC, several WorkOS services were unavailable for some customers. Impacted services included AuthKit, Dashboard, Admin Portal, SSO, Directory Sync, and their APIs. The outage lasted 28 minutes.
Customers can opt to use custom domains to serve WorkOS products as part of a branded experience. The incident occurred after a code change introduced a bug in the edge worker responsible for custom domain routing.
We identified the issue and rolled back the affected edge worker deployment. We recognize the seriousness of this outage and are committed to maintaining the highest level of reliability across our platform.
WorkOS supports custom domains across several products, including AuthKit, SSO, Admin Portal, and our APIs, to enable customers to deliver a branded experience. We use edge workers to handle routing logic for these domains.
A recent code change introduced a bug in the edge worker that caused it to throw errors during request handling. This bug was not caught by our test suite due to environment differences between the V8 runtime used by our edge workers and the Node.js environment used in tests. The test suite did not replicate the behavior of the V8 runtime closely enough to surface this issue. Differences in how custom domains are configured in our pre-production environments prevented automated health checks from detecting the issue.
Edge workers operate in the critical path of our services. In response to this incident, we are taking immediate steps to strengthen our testing and deployment safeguards for edge infrastructure. This includes adding automated health checks for custom domain routes in pre-production and tightening our rollout procedures for edge worker changes.