Harmony 8.30 External RCA
Harmony 8.30 was released to production on Thursday, April 12. Customers experienced several issues with this release and with the subsequent release rollback.
- NetSuite connection errors, experienced by a subset of Cloud Agent customers
- Temp directory and chunking failures, experienced by a number of Cloud Agent customers
- Project download errors, experienced by customers with an ampersand (&) in the project name or in certain project metadata
- Stuck operations and Apache/Tomcat errors, experienced by a subset of Cloud Agent users during the Cloud Agent rollbacks
All of the issues were reported, identified, and rectified within 24 hours, by April 13, 6 pm Pacific Time.
Note: All times in this document are in Pacific Time.
NetSuite Connection Errors
What was the root cause of the problem?
A change in support of NetSuite 2018.1 was made based on the recommendation from NetSuite. A new parameter (?c) was added to the NetSuite URL (see screenshot below) to enable the connection to work in both NetSuite’s production and preview environments.
While this was based on the recommendation from NetSuite itself and was thoroughly tested internally and independently by Jitterbit, this change, however, was not (forward) compatible with some of the upgrades/maintenance NetSuite was performing in production at the same time and several NetSuite servers/accounts rejected new URLs.
We have opened Case #3011384 with NetSuite to investigate this issue further and have fully rolled back the change until it is fully resolved by NetSuite.
What was the timeline?
Apr 12, 9:20 pm: Completed deployment to production.
Apr 13, 7:30 am: Received initial reports of NetSuite connection errors.
Apr 13, 9:20 am: Development team identified the URL change as the cause of the problem.
Apr 13, 12:20 pm: Ready with an 8.30 agent build with the NetSuite fix.
Apr 13, 12:30 pm: Decision made to roll back to 8.29 agents instead of deploying updated agents.
Apr 13, 4:00 pm: Completed agent rollback.
Temp Directory Errors
What was the root cause of the problem?
As part of 8.30, a change was made to the way the temp directory is cleaned up after an operation finishes. For operations with chunking enabled, this change led to data being prematurely removed prior to chunking fully completing. While testing the chunking feature is part of our standard suite of regression tests, we did not observe any issues in our QA environments. As such, until we can fully evaluate the reasons for false positives and inability to reproduce this behavior under controlled conditions, we have reverted this change and continue to research the scope and reasons for this inconsistency.
What was the timeline?
Apr 12, 9:20 pm: Completed deployment to production.
Apr 13, 1:20 am: Received initial reports of temp directory errors.
Apr 13, 6:20 am: Rolled back download links for agents from 8.30 to 8.29.
Apr 13, 12:20 pm: Ready with an 8.30 agent build with rollback of temp directory cleanup changes.
Apr 13, 12:30 pm: Decision made to roll back to 8.29 agents instead of deploying updated agents.
Apr 13, 4:00 pm: Completed agent rollback.
Project Download Errors
What was the root cause of the problem?
As part of 8.30, a change was made to better handle international characters when a project is downloaded by Studio. Due to the new way the data was encoded and passed between some of the internal portions of our Harmony Cloud architecture, an ampersand (&) character in the name of a project or in certain project metadata field resulted in issues with internal XML encoding. The immediate fix was to revert these changes from 8.30.
What was the timeline?
Apr 12, 9:20 pm: Completed deployment to production.
Apr 13, 6:10 am: Received initial reports of project download errors.
Apr 13, 12:20 pm: Ready with an 8.30 build of the Harmony Cloud component with rollback of project download changes.
Apr 13, 6:00 pm: Deployed Harmony Cloud component to production.