I am working with Tim Coonfield to develop a talk for the one day Agile Shift conference scheduled for April 12, 2019 in Houston, TX titled “10 Biggest Mistakes We Made on Our Agile Journey (and Why We are Glad We Made Them)“. This is the second in a series of articles that Tim and I will use to explore some of those mistakes and what we learned from them. You can find other articles in this series here.
If you are interested in hearing this talk or some of the other awesome speakers and topics that will be covered at the event, you can learn more about the conference and purchase tickets here.
Everything I will share here happened at Global Custom Commerce (GCC), a Home Depot Company, as we developed and improved our 2nd generation e-commerce platform, called Autobahn, designed to make it easy for customers to shop for and buy more complex configurable products and services like custom window blinds, custom windows, flooring and decks.
In 2011, when the Autobahn platform started development, GCC was already the #1 seller of custom window coverings online and owned several brands including blinds.com. We were a couple years away from acquisition by the Home Depot and had about 80 employees. The existing e-commerce platform had been in production for a number of years and was still actively being improved by a large team following a traditional project management philosophy using Gantt charts and reporting on percent complete.
The Autobahn project marked a number of firsts for GCC including our first use of cloud hosting at AWS and our first use of agile methodologies. This article highlights one of our bigger mistakes and how we were able to improve as a result.
In the early days of the effort to deliver our 2nd generation e-commerce platform, Autobahn, we adopted the Scrum methodology and were following all the practices “by the book” including sprint planning, stand-ups, sprint review and sprint retrospectives. However, the company continued to use more traditional QA techniques and processes. As a result, QA engineers were assigned to the team but continued to work semi-independently with their own manager. This obvious mistake was rectified fairly quickly and is not really at the heart of what I want to tell you about here. It’s just important to note since it could be tempting to attribute what came next to where we started with QA engineers sitting half inside and half outside the team.
We also made one large decision around architecture that impacted this story somewhat. After attending Udi Dahan’s distributed architecture course, many on the team wanted to focus on building a microservices architecture with small, loosely coupled components connected generally by asynchronous messaging. Remember, this was 2011 when these ideas were just gaining widespread attention. After consulting with experts, including Udi, we were advised to build a monolithic web application using a more traditional layered architecture to provide separation of concerns. For the first year or so, that is exactly what we did. By the time we started to introduce microservices, there was a pretty significant monolith sitting at the center of our platform. In retrospect, this was clearly a mistake. This certainly contributed to what follows, but was not the sole or even the most notable cause.
When the Autobahn effort started, the business and the development team all agreed that quality was one of the most important things required to make the new platform successful. In fact, the new effort was chartered, in part, based on the promise that quality would be baked into the new platform from the beginning. After all, the company had suffered for years from quality issues on the existing platform and was tired of spending too many cycles fixing problems and not enough time truly innovating.
The team invested in quality from the beginning. Early in the first sprint, we had automated continuous integration including a comprehensive unit testing suite that ran on every commit and failed the build if any tests failed. We also implemented code coverage reporting and focused on achieving as close to 100% test coverage as we could get. The team cared deeply about quality and was fully committed to writing and maintaining unit tests to make sure things worked as designed and continued to work as the code base evolved.
Besides unit tests, our definition of done included an expectation for QA system, integration and regression testing. The QA engineers on the team were responsible for writing test cases based on the stories. Once stories were ready for testing, the QA engineers took responsibility for executing the test cases and recording issues on pink stickies that were added to the physical scrum board maintained in the team area. Software engineers took responsibility for fixing all the bugs the business deemed important in the sprint. Once stories were completed and integrated into the main branch, QA engineers focused on testing for regression using a rapidly growing set of test cases stored in a test case management system.
Within a few sprints, a clear cadence and separation of duties naturally developed within the team. QA engineers would start the sprint trying to automate key use cases from the sprint before. They would also work with the PO to produce a set of test cases for the stories committed in the sprint. Meanwhile, software engineers would start several stories in parallel. By the early part of the second week, software engineers would start finishing up stories with passing unit tests and QA engineers would start UI and integration testing. By mid-week, the team would have all the stories in good shape and would start merging everything into a release branch. Thursday morning we would lock down the release branch so our QA engineers could focus on regression testing and work with the software engineers to make everything ready for review on Friday.
After a few sprints of this, the team started referring to the key testing day in the sprint as “Bug Fix Thursday”. It was a neat way to describe the code freeze that would happen each sprint after the merge completed and regression testing started. Up until Bug Fix Thursday, the team was able to focus on developing new features for the sprint. Starting in the morning on Bug Fix Thursday, the software engineers would generally work ahead on stories lined up for the next sprint if they weren’t busy fixing bugs identified by QA engineers on the team.
Sometimes we had trouble getting stories ready in time for Bug Fix Thursday. Most of the time we simply relaxed the code freeze rule to allow ourselves to add to the release branch later on Thursday, or, in extreme cases, Friday morning. This put a lot of pressure on the QA engineers to either rush through regression testing or to perform multiple rounds of regression testing. It also led to some unhealthy behaviors like allowing regression testing to leak into the beginning of the next sprint. Since the team was small and the platform was not in production yet, we were able to live with some of these problems for quite a while.
As Autobahn gained momentum and the team grew, Bug Fix Thursday got a little uncomfortable. As one agile team grew to two and then to three, we starting feeling the pinch of Bug Fix Thursday more and more often as teams struggled to merge and test all the sprint’s stories in time for the demo on Friday afternoon. Although we introduced more cleanly separated microservices that could be deployed independently, most sprints included functionality that touched the monolithic customer or associate web sites and required extensive regression testing to ensure everything worked as expected. QA engineers felt the pressure the most as regression testing routinely leaked into the following sprint even for stories the team was calling “done”.
Processes were improved to compensate. The one that seemed to help the most was focusing teams on getting more stories done and ready to deploy in the first week of the sprint. This forced teams to work on one or two stories at a time and to make sure they were merged and regression tested before moving onto another story. Although this did not eliminate Bug Fix Thursday, it gave the QA engineers enough confidence to time box regression testing by reducing the number of test cases checked on Bug Fix Thursday.
As we grew from three teams to six and started exploring new business opportunities, Bug Fix Thursday started to get very uncomfortable again. The team exploring new businesses started to release pilot components more frequently, mainly because these systems had very small impacts. However, when they touched critical system components, which was far too often due to the monolithic nature of the system core, their code had to be merged into what was becoming one very big and complex sprint release. The team was also surprised by how these “safe” releases managed to break things in unanticipated ways. We beefed up our unit testing. We added integration tests. We tried adding a QA engineer to float outside the teams and focus on writing more automated UI tests. We brought automated UI testing into the sprint. We challenged our software engineers to work more closely with the QA engineers on the team to finish regression testing at the end of the sprint. We even turned Bug Fix Thursday into Bug Fix Wednesday for a little while to allow more time for regression testing to complete. Some of these changes worked and stuck, some didn’t, but overall the various changes seemed to help us keep Bug Fix Thursday manageable. We got to the point where releases would happen the Tuesday after the sprint and the business was reasonably satisfied.
Behind the scenes, our QA engineers were barely holding things together. They worked long hours on Bug Fix Thursday often testing late into the night. They tested Fridays after sprint review to make sure the release was ready. Testing often continued through the weekend and into Monday. Occasionally, testing could not get done by Tuesday and releases would slip into Thursday and, in extreme cases, into the following Tuesday.
By the time we added our eighth development team, the unrelenting pressure had led us to make a number of quiet compromises on quality. The pressure to finish last sprint’s testing left QA engineers with little time to write and maintain automated UI tests. Because comprehensive regression testing was taking too long, manual regression testing focused on areas the team thought could be impacted by the changes in the sprint and very little time would be spent testing other areas. Because schedule pressure was almost always present, the team did not believe they had the time they needed to clean up the monolithic components so technical debt was growing and it was getting harder to accurately identify the parts of the system that really needed regression testing.
Once we grew to 12 teams, the symptoms were clearly visible to the team and our business. One sprint’s release took so long to test that we decided to combine it with the subsequent sprint’s work into one gigantic release. “Hot fixes”, intra-sprint releases made to fix critical bugs that were impacting our customers, became common. In fact, we were starting to see cases where one hot fix would introduce yet another critical issue requiring a second hot fix to repair.
Finally, the pace of change completely overwhelmed our teams and processes. Release after release either failed and required rollback or resulted in a flurry of hot fixes. In one particularly bad week, the sprint release spawned a furious hydra; Each time we fixed one problem, two more would show up to replace it. By that time, I was leading the IT organization and, after consulting with team members and leaders, I mandated strict rules around regression testing, hot fixes and releases to stop the bleeding.
Simultaneously, we launched a small team of three people dedicated to improving quality and our ability to release reliable software frequently. We named it Yoda. We claimed it was an acronym, but I can’t find anyone that remembers what the letters were supposed to mean. Its biggest concrete deliverable would be an improved automated regression testing suite. We also asked the Yoda team to find ways to simplify the release process and improve the overall engineering culture.
Over the next several months, the Yoda team made progress. As expected, automated tests improved. However, the big improvement came from improvements in the release management process and the culture.
Although by this time the web sites were still pretty monolithic, they were surrounded by microservices that were independently deployable. The teams had also made progress on making aspects of the web sites independently deployable. The Yoda team spent some time documenting the various components and worked with various development teams across the company to determine which were truly independent enough to release on their own and which required more system-wide regression testing. Yoda improved the continuous delivery process and added a chatbot to make it easier for development team members to reliably deploy. They worked with the development teams to make releases easier to rollback too.
Once the Yoda effort gained momentum and the development teams were ready, we relaxed the rules around regression testing and releases for the components that Yoda identified as reasonably separated and safe to release independently. Over the next couple of months, we went from 1 large release per 2-week sprint to over 50 per week. Because releases were smaller, they were easier to test and quality improved. Hot fixes became rare again. Rollbacks occurred from time to time, but, because teams planned for the possibility, did not create the kind of drama we observed in the past.
Process changes were also required. As the number of releases per sprint increased, we realized visible functionality was making it to production before business stakeholders had a chance to formally review and approve it. As a result, teams started to demo stories to stakeholders as soon as they were done and ready to deploy. For some teams, that made the traditional end of sprint review exercise far less useful. Therefore, some teams stopped performing the end of sprint review though they continue to value and practice retrospectives based on the feedback received from the many stakeholder reviews and releases that happen during the sprint. As they work more story by story, teams are gradually starting to look at things like cumulative flow diagrams and cycle times and are starting to experiment with other agile methodologies, such as Kanban.
And so Bug Fix Thursday lived and mostly died within our agile process. At times, it served us well. At times, it reflected problems in our process or our code. At times, it created additional problems and raised stress levels. The solve, though obvious in retrospect, was terribly counter-intuitive especially in a world where the codebase includes some critical monolithic components: Create and nurture a culture that values releasing more and more frequently. Smaller, more frequent releases make testing easier and the risks smaller. Independently testable and deployable components are an important part of the story, but don’t do much good without the commitment to release more frequently. Although we had always talked about it and even built much of the necessary infrastructure to support it, we never brought it into focus until we launched Yoda and truly changed our culture.
Unlike some of the other stories in this series, we’re still not quite done with Bug Fix Thursday. We just found a way to make it smaller and insured that it can’t get any bigger by limiting its impact to the monolithic pieces of our system that are left over from the early days of the Autobahn platform. We’re also committed to shrinking it further over the coming months by focusing a small team, called Supercharge Autobahn, on breaking down the highly complex remaining pieces of the original monolith into truly independent components. We also continue to work on our engineering culture to make sure we don’t backslide.