Product: Jitterbit iPaaS
Issue: There was an issue impacting Jitterbit iPaaS in EMEA that impaired services for some customers.
Impact: Major
Services Impacted: This outage affected services across EMEA. Customers were unable to log in, and running of operations was impacted, including failures of API invocations on the API Manager.
Location: EMEA
Problem Description
Our high-availability (HA) messaging system experienced an outage. The primary server experienced a memory capacity issue. The process stayed active and, therefore, did not fail-over. This prevented the backup server from fully taking over, causing a communication breakdown between microservices.
Timeline of Events (UTC)
2025-11-14 08:36 - Messaging Broker 1 goes OOM and Broker 2 is promoted
2025-11-14 10:04 - Harmony Rest Service goes out of service
2025-11-14 10:41 - Broker 1 rebooted, service restored
2025-11-14 10:43 - Harmony Rest Service restored
Root Cause
Memory Exhaustion: The primary Message Broker experienced an Out of Memory (OOM) condition due to the exhaustion of its heap memory, leading to resource depletion and service degradation.
HA Failover Failure: Although the system was in a degraded state, the primary broker's process did not completely fail or terminate. This state prevented the high-availability (HA) mechanism from successfully triggering the expected failover, leaving the backup server unable to assume the primary role and restore message communication between microservices.
Action
Immediate Action:
The server was restarted, and the available memory (heap space) was increased.
Strategic Action:
Implement a process to ensure the backup server successfully takes over immediately in the event of memory exhaustion.
Implement a tighter and deeper alerting for the messaging system focused on resource exhaustion and HA status.