Payment processing is currently experiencing issues

Incident Report for Universe

Postmortem

Just before 1 am ET one of the databases we use for tracking payments experienced an out of memory error. This delayed the final step in the automatic capture of credit card funds for the duration of the outage. Once service was restored, payment collection automatically resumed, retroactively finalizing transactions created during the outage period.

While we have extensive monitoring in place to notify us in advance of disruptive conditions, we’ve determined that the alarm for this particular metric was the incorrect type. For that reason, the impending outage was very easy to miss. The database disruption caused upstream health checks to fail, which were immediately surfaced to the entire dev team in Slack.

With the problem identified, memory was increased for the payment tracking database and service was quickly restored.

We take reliability & stability very seriously at Universe. While our already robust infrastructure allowed us to recover from this outage without dropping any transactions, we’re committed to taking that resilience to the next level. We’re taking several immediate steps to ensure this type of outage is impossible in the future: We’re migrating to an autoscaling database system which would prevent this type of error, and we’re implementing a full alarm & notification review to ensure that we’re alerted proactively and consistently when resources are under stress.

Posted Aug 10, 2022 - 15:44 EDT

Resolved

This incident has been resolved.

Posted Aug 09, 2022 - 17:13 EDT

Monitoring

All systems are operational, and all outstanding payments have been processed.

We are continuing to closely monitor our systems to confirm their ongoing stability.

A post-mortem of this incident will be posted shortly after a thorough investigation.

Posted Aug 09, 2022 - 03:56 EDT

Identified

We have identified the cause of our payment processing delays - a fix is being deployed now. Outstanding payments will be automatically processed once the fix is deployed.

Posted Aug 09, 2022 - 03:09 EDT

Investigating

We are currently investigating this issue.

Posted Aug 09, 2022 - 02:44 EDT

This incident affected: Ticket Processing.