Root Cause Analysis
We apologize for the service disruption. We appreciate your patience and understanding. We are fully committed to resolving this and minimizing further disruptions.
Issue: The cloud agents scheduler service failed to restart after a scheduled upgrade, disrupting customer schedules during the outage.
Impact: Major
Services Impacted: Customer scheduled operations using Jitterbit Sandbox and Production Cloud Agent groups
Location: APAC Cloud
Problem Description: Jitterbit iPaas scheduler service utilizes an external library for handling timezone-related functions. This library, upon initialization, attempts to update its timezone data file. An incorrect value within this file caused the Scheduler service to fail during startup.
Timeline:
9/6/2024 (UTC)
04:00 - The upgrade process began and the old servers were drain stopped from service
05:15 - The process was finished and internal testing began and discovered the issue with the scheduler service
08:20 - Identified the issue and started to develop a workaround and then apply to the Cloud Agents
08:55 - Workaround applied and confirmed that issues was mitigated
Root Cause: A syntax error was found in third-party timezone data file
Action:
Immediate Action:
- Implemented a work around by fixing the timezone data file and restarting the service
- Document workaround in release notes for customers that may run into this issue with their private agent
Strategic Action:
- A future agent version will not update the timezone data file automatically.
- Correct syntax of timezone data file