The Biggest Mistakes We Made on Our Agile Journey (and Why We are Glad We Made Them) — Working at Home is More Productive

Image courtesy of inc.com

I am working with Tim Coonfield to develop a talk for the one day Agile Shift conference scheduled for April 12, 2019 in Houston, TX titled “10 Biggest Mistakes We Made on Our Agile Journey (and Why We are Glad We Made Them)“. This is the fourth in a series of articles that Tim and I will use to explore some of those mistakes and what we learned from them.  You can find other articles in this series here

If you are interested in hearing this talk or some of the other awesome speakers and topics that will be covered at the event, you can learn more about the conference and purchase tickets here.

Everything I will share here happened at Global Custom Commerce (GCC), a Home Depot Company, as we developed and improved our 2nd generation e-commerce platform, called Autobahn, designed to make it easy for customers to shop for and buy more complex configurable products and services like custom window blinds, custom windows, flooring and decks.  

In 2011, when the Autobahn platform started development, GCC was already the #1 seller of custom window coverings online and owned several brands including blinds.com. We were a couple years away from acquisition by the Home Depot and had about 80 employees. The existing e-commerce platform had been in production for a number of years and was still actively being improved by a large team following a traditional project management philosophy using Gantt charts and reporting on percent complete.

The Autobahn project marked a number of firsts for GCC including our first use of cloud hosting at AWS and our first use of agile methodologies. This article highlights one of our bigger mistakes and how we were able to improve as a result.

In the early days of the Autobahn project when the team was still a small one with only 4 team members and a PO, our manager established a policy that each of us could and should work from home one or two days each week. Since we were practicing Scrum more or less by the book, the only exceptions were sprint planning day, the first Monday of each two-week sprint, and sprint review day, the last Friday. Most of the team took advantage. I was a notable exception mostly because I had two twin baby boys in the house, which made it difficult if not impossible to do focused coding at home.

The truth is I also hated working at home. I had plenty of experience. For years, I had run a small consulting company and spent a lot of my time working out of the house often on projects with geographically distributed teams. Along the way, I wrote a monthly column for an industry journal and a best-selling technical book on a tool nobody uses anymore (C++ Builder from the long gone, but fondly remembered IDE pioneer, Borland). Maybe it was the technology back in the late 90s, but I always found it more productive to lean over to the person next to me for a quick chat rather than getting on the phone or hopping on the latest flavor of chat.

Anyway, the rest of the team loved it. No getting up early and fighting traffic to make stand-up, no expensive lunches out and no long commutes home. It also was a great time to focus on writing code with no distractions, no background noise and no meetings.

We had all the best technology 2012 had to offer at our disposal. Our CI/CD infrastructure and our test environments were hosted at AWS. Our source code was at Github. Our office network featured a VPN that was already supporting dozens of call center associates that worked at home on a daily basis. The company paid for our cell phones and our contacts were up to date. We had Skype accounts and we were not afraid to use them. We all had fast Internet connections at home too.

Over the next few months a very clear and rather disturbing pattern developed. When everyone was in the office, things moved along very quickly. If you ran into a problem, you talked to the person next to you and solved it instantly. If you had a question for the PO, you stood up, walked 3 feet and tapped him on the shoulder. We used stickies on a whiteboard to track our work, and it was super-easy to walk up there and grab the next task. The team often went to lunch together and talked about architecture, the business and sometimes nothing at all, but always enjoyed the camaraderie.

The work at home days were very different even though they weren’t supposed to be. The day before, whoever wasn’t going to be in the office would be careful to grab a couple tasks and move them into the “work in process” column on our whiteboard. Although the remote person would call into stand-up, she would usually have a very hard time hearing the conversation and, when talking about the work she was going to do today, would struggle to point out the right cards on the board. During the day, the remote person would generally work in a pretty isolated fashion. We rarely spoke to remote workers. Often, when we tried, we ended up leaving a voice mail and got a call back within an hour or so. Pretty much the same thing would happen with Skype.After awhile, it was easier to find someone in the office or wait for the next day. It felt almost like the work at home folks fell into a short-lived black hole where the speed of collaboration fell asymptotically close to zero.

The good news was working code often came out of that black hole thanks to the lack of interruptions, but not as much as we thought. Working at home was harder than people thought. It was far easier to keep banging away at a problem than it was to get a second set of eyes on the code when it involved Skype calls, screen sharing and Internet lag. Every technical glitch and every missed call just made it more likely that everyone would wait for tomorrow to collaborate. Technical whiteboard design sessions just worked more smoothly with a real whiteboard. As a result, work at home productivity did not match what we saw from the same people in the office.

Of course, the team noticed and started talking about it more and more. Two camps formed — those for work at home days and those opposed. We all tried very hard. We experimented with new technologies. For example, we started to use video conferencing on an iPad to try and bring remote workers into the daily stand-up. We also moved to Jira to make it easier for the remote team members to share the task board. It all helped a little, but it was not able to close the gap.

Working in the office was just easier and more productive. One by one, the work at home advocates starting coming into the office more frequently. After a few months, work at home became a rare thing used mostly when a workman was expected to fix something or the kids were off school. We had all come to value face to face interaction and the speed of collaboration it allowed us.

Even today, our teams highly favor face to face interaction. Although most, if not all of them, use electronic tools to track their tasks, they still put various artifacts in physical form on whiteboards. We use Slack extensively, but we talk in person far more. Team members value collaboration so much that they willingly change desks to sit close together with other people working on a shared initiative even if it is only planned to run a month or two. We hold as many development-related meetings as possible in public spaces near the teams so people working at their desks can overhear what is being discussed and join the conversation if they think they have something to contribute. Even when doors are closed, engineers know they can simply walk over and interrupt if something important has come up. All of these things are just harder to do when you have team members working remotely.

That is not to say we never work at home or we don’t sometimes work with geographically distributed teams. Technology, people and business realities all are driving a demand for more and more remote work. We have worked very hard over the last couple of years to remove barriers to remote work. However, collaboration is still easier and more fun when you are in the same space. The benefits gained from remote work, such as better work-life balance and more control over interruptions, typically are outweighed by the tax you pay in collaboration friction. Although the technology has advanced, it simply cannot match co-location.

The Biggest Mistakes We Made on Our Agile Journey (and Why We are Glad We Made Them) — Agile Lawyers

Image from lawyersfavorite.com

I am working with Tim Coonfield to develop a talk for the one day Agile Shift conference scheduled for April 12, 2019 in Houston, TX titled “10 Biggest Mistakes We Made on Our Agile Journey (and Why We are Glad We Made Them)“. This is the third in a series of articles that Tim and I will use to explore some of those mistakes and what we learned from them.  You can find other articles in this series here

If you are interested in hearing this talk or some of the other awesome speakers and topics that will be covered at the event, you can learn more about the conference and purchase tickets here.

Everything I will share here happened at Global Custom Commerce (GCC), a Home Depot Company, as we developed and improved our 2nd generation e-commerce platform, called Autobahn, designed to make it easy for customers to shop for and buy more complex configurable products and services like custom window blinds, custom windows, flooring and decks.  

In 2011, when the Autobahn platform started development, GCC was already the #1 seller of custom window coverings online and owned several brands including blinds.com. We were a couple years away from acquisition by the Home Depot and had about 80 employees. The existing e-commerce platform had been in production for a number of years and was still actively being improved by a large team following a traditional project management philosophy using Gantt charts and reporting on percent complete.

The Autobahn project marked a number of firsts for GCC including our first use of cloud hosting at AWS and our first use of agile methodologies. This article highlights one of our bigger mistakes and how we were able to improve as a result.

Because he plays such a big role in this story, I’ll be referring to our first PO, Wade Pinder, by name. He played a big role in our agile journey and an even bigger one in this story. Although we have lots of POs these days to serve our twelve agile teams, Wade remains the strongest agilist here at GCC. You can find Wade speaking and coaching agile around Houston and on LinkedIn.

Several years ago, we started having conversations like the following between development teams and Wade Pinder, our PO at the time, often very late in the sprint when the team was rushing to finish up the last few stories to meet their commitment for the sprint:

“Wait a second”, Wade says pointing at the screen. “this is the first time I am seeing the screen and such and such doesn’t work the way we need. I can make some specific suggestions now that we have a screen to look at. Maybe we can get some of the end-users to take a look and help as well.”

The engineer doing the demo grimaces and says, “Well, the story is really done and the requirements weren’t called out in the acceptance criteria. We won’t have time to do any of that this sprint. We can write it into a new story and maybe pick it up next sprint”.


Sometimes the conversation would turn into a more extensive debate. Wade would remind the developers that a well-written story was “a reminder to have a conversation” and the agile manifesto calls for “customer collaboration over contract negotiation”. He would also accurately make the point that the intent of the story was quite clear by the time sprint planning started based on all the conversations that had taken place. Down deep, the development team knew they had failed to deliver on that intent. However, most times a narrow reading of the acceptance criteria gave them the cover they needed. They were acting like lawyers that get a guilty client off on a legal technicality.

The changes needed were generally not huge. They often ended up in what we would categorize as a small or medium story. Equally importantly, these weren’t cases when a story turned out to be too big and sprawling to actually be a single story. It was really the case that the team was not delivering on the intent of the story. It was happening more and more often and resulted in far too many sprints where one or more stories would be “done” but did not deliver the expected business value until the follow-up story was completed and deployed in some future sprint.

After months of this, Wade decided to take action. He decided to write stories that captured every single aspect of everything expected in excruciating detail. Small stories that used to have less than five acceptance criteria, had 20 or more. Every UI change came with a mockup and a clearly-written expectation for pixel-perfect delivery. Every field was described, every validation was specified, expected response times were documented and every error message was spelled out. By the time he was done, every story ran for several printed pages. Each one was like the most cunning contract ever assembled by the most skillful corporate lawyer — completely devoid of wiggle-room, full of landmines and utterly impossible for mere mortals to understand.

The development team was horrified by the more detailed stories when they first saw them in a backlog grooming session. There was no room for creativity and no room for their input. Every detail was locked in.
Where was the room for discussion? What if there was a better way? What if there was an easier way?

Then we really started to talk. Wade and the developers saw eye to eye for the first time in months. The developers agreed to show Wade screens and other story artifacts as quickly as they became available to provide time and context for meaningful conversation and course adjustment. In fact, many teams adopted an extra step at daily stand-up to ask out loud “what can we show Wade today”. Wade agreed to go back to writing stories that were reminders to have conversations instead of contracts that specified every detail. Together, everyone agreed to focus on delivering the business value each story was intended to deliver even when it meant missing a sprint commitment.

Agile Lawyers still popup from time to time, but they are easier to deal with now. If it’s a developer, someone that’s been around for awhile simply tells this story and reminds them of what can happen if they force the PO to turn into an agile lawyer. If it’s a PO, it almost always turns out that the PO is reacting to an agile lawyer on the development team and, well, you know that story.

The Biggest Mistakes We Made on Our Agile Journey (and Why We are Glad We Made Them) — Bug Fix Thursday

I am working with Tim Coonfield to develop a talk for the one day Agile Shift conference scheduled for April 12, 2019 in Houston, TX titled “10 Biggest Mistakes We Made on Our Agile Journey (and Why We are Glad We Made Them)“. This is the second in a series of articles that Tim and I will use to explore some of those mistakes and what we learned from them.  You can find other articles in this series here

If you are interested in hearing this talk or some of the other awesome speakers and topics that will be covered at the event, you can learn more about the conference and purchase tickets here.

Everything I will share here happened at Global Custom Commerce (GCC), a Home Depot Company, as we developed and improved our 2nd generation e-commerce platform, called Autobahn, designed to make it easy for customers to shop for and buy more complex configurable products and services like custom window blinds, custom windows, flooring and decks.  

In 2011, when the Autobahn platform started development, GCC was already the #1 seller of custom window coverings online and owned several brands including blinds.com. We were a couple years away from acquisition by the Home Depot and had about 80 employees. The existing e-commerce platform had been in production for a number of years and was still actively being improved by a large team following a traditional project management philosophy using Gantt charts and reporting on percent complete.

The Autobahn project marked a number of firsts for GCC including our first use of cloud hosting at AWS and our first use of agile methodologies. This article highlights one of our bigger mistakes and how we were able to improve as a result.

In the early days of the effort to deliver our 2nd generation e-commerce platform, Autobahn, we adopted the Scrum methodology and were following all the practices “by the book” including sprint planning, stand-ups, sprint review and sprint retrospectives.  However, the company continued to use more traditional QA techniques and processes. As a result, QA engineers were assigned to the team but continued to work semi-independently with their own manager. This obvious mistake was rectified fairly quickly and is not really at the heart of what I want to tell you about here. It’s just important to note since it could be tempting to attribute what came next to where we started with QA engineers sitting half inside and half outside the team.

We also made one large decision around architecture that impacted this story somewhat. After attending Udi Dahan’s distributed architecture course, many on the team wanted to focus on building a microservices architecture with small, loosely coupled components connected generally by asynchronous messaging. Remember, this was 2011 when these ideas were just gaining widespread attention. After consulting with experts, including Udi, we were advised to build a monolithic web application using a more traditional layered architecture to provide separation of concerns. For the first year or so, that is exactly what we did. By the time we started to introduce microservices, there was a pretty significant monolith sitting at the center of our platform. In retrospect, this was clearly a mistake. This certainly contributed to what follows, but was not the sole or even the most notable cause.

When the Autobahn effort started, the business and the development team all agreed that quality was one of the most important things required to make the new platform successful. In fact, the new effort was chartered, in part, based on the promise that quality would be baked into the new platform from the beginning. After all, the company had suffered for years from quality issues on the existing platform and was tired of spending too many cycles fixing problems and not enough time truly innovating.

The team invested in quality from the beginning. Early in the first sprint, we had automated continuous integration including a comprehensive unit testing suite that ran on every commit and failed the build if any tests failed. We also implemented code coverage reporting and focused on achieving as close to 100% test coverage as we could get. The team cared deeply about quality and was fully committed to writing and maintaining unit tests to make sure things worked as designed and continued to work as the code base evolved.

Besides unit tests, our definition of done included an expectation for QA system, integration and regression testing. The QA engineers on the team were responsible for writing test cases based on the stories. Once stories were ready for testing, the QA engineers took responsibility for executing the test cases and recording issues on pink stickies that were added to the physical scrum board maintained in the team area. Software engineers took responsibility for fixing all the bugs the business deemed important in the sprint. Once stories were completed and integrated into the main branch, QA engineers focused on testing for regression using a rapidly growing set of test cases stored in a test case management system.

Within a few sprints, a clear cadence and separation of duties naturally developed within the team. QA engineers would start the sprint trying to automate key use cases from the sprint before. They would also work with the PO to produce a set of test cases for the stories committed in the sprint. Meanwhile, software engineers would start several stories in parallel. By the early part of the second week, software engineers would start finishing up stories with passing unit tests and QA engineers would start UI and integration testing. By mid-week, the team would have all the stories in good shape and would start merging everything into a release branch. Thursday morning we would lock down the release branch so our QA engineers could focus on regression testing and work with the software engineers to make everything ready for review on Friday.

After a few sprints of this, the team started referring to the key testing day in the sprint as “Bug Fix Thursday”. It was a neat way to describe the code freeze that would happen each sprint after the merge completed and regression testing started. Up until Bug Fix Thursday, the team was able to focus on developing new features for the sprint. Starting in the morning on Bug Fix Thursday, the software engineers would generally work ahead on stories lined up for the next sprint if they weren’t busy fixing bugs identified by QA engineers on the team.

Sometimes we had trouble getting stories ready in time for Bug Fix Thursday. Most of the time we simply relaxed the code freeze rule to allow ourselves to add to the release branch later on Thursday, or, in extreme cases, Friday morning. This put a lot of pressure on the QA engineers to either rush through regression testing or to perform multiple rounds of regression testing. It also led to some unhealthy behaviors like allowing regression testing to leak into the beginning of the next sprint. Since the team was small and the platform was not in production yet, we were able to live with some of these problems for quite a while.

As Autobahn gained momentum and the team grew, Bug Fix Thursday got a little uncomfortable. As one agile team grew to two and then to three, we starting feeling the pinch of Bug Fix Thursday more and more often as teams struggled to merge and test all the sprint’s stories in time for the demo on Friday afternoon. Although we introduced more cleanly separated microservices that could be deployed independently, most sprints included functionality that touched the monolithic customer or associate web sites and required extensive regression testing to ensure everything worked as expected. QA engineers felt the pressure the most as regression testing routinely leaked into the following sprint even for stories the team was calling “done”.

Processes were improved to compensate. The one that seemed to help the most was focusing teams on getting more stories done and ready to deploy in the first week of the sprint. This forced teams to work on one or two stories at a time and to make sure they were merged and regression tested before moving onto another story. Although this did not eliminate Bug Fix Thursday, it gave the QA engineers enough confidence to time box regression testing by reducing the number of test cases checked on Bug Fix Thursday.

As we grew from three teams to six and started exploring new business opportunities, Bug Fix Thursday started to get very uncomfortable again. The team exploring new businesses started to release pilot components more frequently, mainly because these systems had very small impacts. However, when they touched critical system components, which was far too often due to the monolithic nature of the system core, their code had to be merged into what was becoming one very big and complex sprint release. The team was also surprised by how these “safe” releases managed to break things in unanticipated ways. We beefed up our unit testing. We added integration tests. We tried adding a QA engineer to float outside the teams and focus on writing more automated UI tests. We brought automated UI testing into the sprint. We challenged our software engineers to work more closely with the QA engineers on the team to finish regression testing at the end of the sprint. We even turned Bug Fix Thursday into Bug Fix Wednesday for a little while to allow more time for regression testing to complete. Some of these changes worked and stuck, some didn’t, but overall the various changes seemed to help us keep Bug Fix Thursday manageable. We got to the point where releases would happen the Tuesday after the sprint and the business was reasonably satisfied.

Behind the scenes, our QA engineers were barely holding things together. They worked long hours on Bug Fix Thursday often testing late into the night. They tested Fridays after sprint review to make sure the release was ready. Testing often continued through the weekend and into Monday. Occasionally, testing could not get done by Tuesday and releases would slip into Thursday and, in extreme cases, into the following Tuesday.

By the time we added our eighth development team, the unrelenting pressure had led us to make a number of quiet compromises on quality. The pressure to finish last sprint’s testing left QA engineers with little time to write and maintain automated UI tests. Because comprehensive regression testing was taking too long, manual regression testing focused on areas the team thought could be impacted by the changes in the sprint and very little time would be spent testing other areas. Because schedule pressure was almost always present, the team did not believe they had the time they needed to clean up the monolithic components so technical debt was growing and it was getting harder to accurately identify the parts of the system that really needed regression testing.

Once we grew to 12 teams, the symptoms were clearly visible to the team and our business. One sprint’s release took so long to test that we decided to combine it with the subsequent sprint’s work into one gigantic release. “Hot fixes”, intra-sprint releases made to fix critical bugs that were impacting our customers, became common. In fact, we were starting to see cases where one hot fix would introduce yet another critical issue requiring a second hot fix to repair.

Finally, the pace of change completely overwhelmed our teams and processes. Release after release either failed and required rollback or resulted in a flurry of hot fixes. In one particularly bad week, the sprint release spawned a furious hydra; Each time we fixed one problem, two more would show up to replace it. By that time, I was leading the IT organization and, after consulting with team members and leaders, I mandated strict rules around regression testing, hot fixes and releases to stop the bleeding.

Simultaneously, we launched a small team of three people dedicated to improving quality and our ability to release reliable software frequently. We named it Yoda. We claimed it was an acronym, but I can’t find anyone that remembers what the letters were supposed to mean. Its biggest concrete deliverable would be an improved automated regression testing suite. We also asked the Yoda team to find ways to simplify the release process and improve the overall engineering culture.

Over the next several months, the Yoda team made progress. As expected, automated tests improved. However, the big improvement came from improvements in the release management process and the culture.

Although by this time the web sites were still pretty monolithic, they were surrounded by microservices that were independently deployable. The teams had also made progress on making aspects of the web sites independently deployable. The Yoda team spent some time documenting the various components and worked with various development teams across the company to determine which were truly independent enough to release on their own and which required more system-wide regression testing. Yoda improved the continuous delivery process and added a chatbot to make it easier for development team members to reliably deploy. They worked with the development teams to make releases easier to rollback too.

Once the Yoda effort gained momentum and the development teams were ready, we relaxed the rules around regression testing and releases for the components that Yoda identified as reasonably separated and safe to release independently. Over the next couple of months, we went from 1 large release per 2-week sprint to over 50 per week. Because releases were smaller, they were easier to test and quality improved. Hot fixes became rare again. Rollbacks occurred from time to time, but, because teams planned for the possibility, did not create the kind of drama we observed in the past.

Process changes were also required. As the number of releases per sprint increased, we realized visible functionality was making it to production before business stakeholders had a chance to formally review and approve it. As a result, teams started to demo stories to stakeholders as soon as they were done and ready to deploy. For some teams, that made the traditional end of sprint review exercise far less useful. Therefore, some teams stopped performing the end of sprint review though they continue to value and practice retrospectives based on the feedback received from the many stakeholder reviews and releases that happen during the sprint. As they work more story by story, teams are gradually starting to look at things like cumulative flow diagrams and cycle times and are starting to experiment with other agile methodologies, such as Kanban.

And so Bug Fix Thursday lived and mostly died within our agile process. At times, it served us well. At times, it reflected problems in our process or our code. At times, it created additional problems and raised stress levels. The solve, though obvious in retrospect, was terribly counter-intuitive especially in a world where the codebase includes some critical monolithic components: Create and nurture a culture that values releasing more and more frequently. Smaller, more frequent releases make testing easier and the risks smaller. Independently testable and deployable components are an important part of the story, but don’t do much good without the commitment to release more frequently. Although we had always talked about it and even built much of the necessary infrastructure to support it, we never brought it into focus until we launched Yoda and truly changed our culture.

Unlike some of the other stories in this series, we’re still not quite done with Bug Fix Thursday. We just found a way to make it smaller and insured that it can’t get any bigger by limiting its impact to the monolithic pieces of our system that are left over from the early days of the Autobahn platform. We’re also committed to shrinking it further over the coming months by focusing a small team, called Supercharge Autobahn, on breaking down the highly complex remaining pieces of the original monolith into truly independent components. We also continue to work on our engineering culture to make sure we don’t backslide.

The Biggest Mistakes We Made on Our Agile Journey (and Why We are Glad We Made Them) — Failure IS NOT an Option

failure

I am working with Tim Coonfield to develop a talk for the one day Agile Shift conference scheduled for April 12, 2019 in Houston, TX titled “10 Biggest Mistakes We Made on Our Agile Journey (and Why We are Glad We Made Them)“. This is the first in a series of articles that Tim and I will use to explore some of those mistakes and what we learned from them. You can find other articles in this series here.

If you are interested in hearing this talk or some of the other awesome speakers and topics that will be covered at the event, you can learn more about the conference and purchase tickets here.

Everything I will share here happened at Global Custom Commerce (GCC), a Home Depot Company, as we developed and improved our 2nd generation e-commerce platform, called Autobahn, designed to make it easy for customers to shop for and buy more complex configurable products and services like custom window blinds, custom windows, flooring and decks.  

In 2011, when the Autobahn platform started development, GCC was already the #1 seller of custom window coverings online and owned several brands including blinds.com. We were a couple years away from acquisition by the Home Depot and had about 80 employees. The existing e-commerce platform had been in production for a number of years and was still actively being improved by a large team following a traditional project management philosophy using Gantt charts and reporting on percent complete.

The Autobahn project marked a number of firsts for GCC including our first use of cloud hosting at AWS and our first use of agile methodologies. This article highlights one of our bigger mistakes and how we were able to improve as a result.

In the early days of the effort to deliver our 2nd generation e-commerce platform, Autobahn, we adopted the Scrum methodology and were following all the practices “by the book” including sprint planning, stand-ups, sprint review and sprint retrospectives.  However, the company continued to use more traditional project management techniques and reporting processes. As a result, the team was required to work with the PMO to keep a Gantt chart updated with progress by mapping completed stories to estimates of percent complete (i.e. a mistake I’ll cover in another article). The team was also concerned that, even though the CEO was shepherding the project himself, many leaders in the company were not happy about the company investing resources to build a new e-commerce platform instead of investing more in the existing one. There was also the issue of racing to replace the existing platform, which remained under active development since nobody in the company was interested in moving our e-commerce business to the new platform until it was more capable than the existing one. Despite these challenges, the team was optimistic that we would deliver on time and within budget.

For the first few months things appeared to be going well.  Every sprint we delivered exactly as promised. I know the Scrum guide talks about forecasts these days, but back then the book called for commitment — a very clear promise from the team to the business that they would deliver what they said they would deliver at planning. As the project started to gain momentum, we shifted focus from basic CRUD to the critical functionality required to sell any sort of configurable product or service.

As the required functionality got more complex, we started to have more difficulty delivering the way we intended. However, we could usually figure out a way to get enough done to demo by interpreting the story narrowly or, in some cases, by pushing the PO to pull out things that we convinced ourselves were not truly critical into new stories to be tackled in future sprints. At times, a couple of us also worked crazy hours over the weekend to make sure things got done as planned. The good news, we thought, was that velocity kept increasing so there was no doubt that we could tackle all those new stories and still deliver according to the original plan. We certainly thought that the combination of increasing velocity and the on-time trend shown in the project plan would make us look good to the business and help us keep our project alive.

Then came THAT story. It’s not really important which story exactly or what it was expected to deliver. What does matter is that after the first week of our two week sprint we knew we were in trouble and we knew THAT story was the problem. As per usual, a couple of us decided to work over the weekend to break through the hard bits so the team could finish wrapping it up in the second week of the sprint. Sunday night we still had not broken through. In fact, we had started to recognize that we had more work ahead of us than we had believed back on Friday evening.

The following Monday, the whole team got together after daily stand-up to talk about THAT story yet again. The team members that worked over the weekend shared details about the issues they had solved and the new issues they had uncovered. Nobody seemed overly worried. After all, we were the team that always delivered and failure just wasn’t an option. We spent a hour or so breaking down the remaining work into a set of tasks and put them up on the board. We updated our burn down and were pleased to note that we were still going to deliver THAT story, and everything else we had committed to, within the sprint even if it was going to require a little extra effort along the way.

Late in the day on Monday, I recognized I was in for a long night. As I stood up to stretch, I noticed other team members around me still hard at work and realized they were likely behind on their tasks too. A few hours later, I realized I was simply staring at the screen and was not really accomplishing much so I packed up to head home with a plan to start fresh early the next morning. The rest of the team appeared to be in about the same place as they headed out after a very, very long day.

It was pretty much the same story Tuesday and Wednesday. Despite lots of conversation, creative re-planning each morning and long days trying very hard to finish tasks, we just couldn’t seem to break through into the home stretch. Left unsaid was a rapidly growing sense of dread — THAT story wasn’t getting done.

Thursday morning we spent significant time after stand-up focusing on how to trim THAT story back, move bits out or otherwise bend the fabric of reality to allow us to call it done.  Our newest team member then said what desperately needed saying, “THAT story is not getting done this sprint. If we keep wasting time on it, we won’t finish up testing the other stories in the sprint either.”  He then advanced a heresy we had not even allowed ourselves to consider, “Perhaps,” he said, “we should fail with honor. We tried our best and we’ll get everything else done this sprint”.

We stood there for a moment and silently asked ourselves essentially the same question: how will the business react and how will we explain it? We started to discuss the idea and very quickly got past the fear. We would simply acknowledge the story wasn’t done and that it could go into the next sprint if the business still wanted it at the top of the priority list. We wouldn’t make excuses and we wouldn’t talk about percent complete. Our direct manager, who coded with the team, was very uncomfortable with the idea and did not want us to do it at first. He knew the political environment and worried about the backlash. In the end, he came around and the team decided to put THAT story aside and make sure the rest of the stories in the sprint were truly done before the demo.

I got the dubious honor of demoing the last completed story and discussing the failed one. Despite all the brave talk the morning before, I distinctly remember the little flutter in my gut as I finished up demoing the last completed story.  I remember staring at the ugly unfinished story on the projector screen as I said, “and THAT story did not get done”.

I took a deep breath and just a second passed before one of the business leaders asked the obvious question, at least the obvious question for someone used to reviewing projects using Gantt charts clearly illustrating percent complete. She asked, “So how close is it to done?”

I delivered the answer, carefully vetted with the team and rehearsed in advance, four simple words designed to explain our commitment to agile principles and to clear communications and transparency, “It’s simply not done”.

Of course, there was more to say after that. It started with a discussion about why percentage complete is an illusion and quickly moved into a group effort to figure out the best way to move forward. After a few minutes, I realized everyone there, the developers and the business, were engaged in looking for solutions and not spending any time at all assigning blame.

None of us should have been surprised, though we were. After all, one of our four company values is “experiment without fear of failure”. In essence, that’s just another way of saying that sometimes you’ll stumble and that’s OK as long as you learn and improve based on the experience. We saw it and lived it that day for sure. It was the beginning of a true partnership between the team and the business to figure out how to really deliver the benefit of a new e-commerce platform to our customers.

It’s also between the lines in the Agile Manifesto.  All of the unintentional spin and irrational optimism inherent in those pretty Gantt charts showing tasks 80% done gets pushed aside by a perfectly clear idea: What’s done is done.  Transparency from a development team is critical to actually making informed decisions about what to do next. “Responding to change”, one of the four values in the Agile Manifesto, is as much about what you learn on the technical side as it is about reacting to evolving business realities. The day our team and our business embraced that simple idea was one of the most important days in our agile journey.

When I Started Writing Code for a Living

punch-card

When I started writing code for a living, C++ was just getting started in the lab. C#, Java, Javascript and HTML, the technologies at the center of almost everything I do at work these days, had not been invented yet. My first job involved changing reel to reel computer tapes over night and writing COBOL on punch cards. Holy hell I’m old.

And yet I am not. I get to build systems that help people improve their homes and their lives. I love Node.js, my applications live in the cloud and my eyes don’t glaze over when talk turns to a debate between Angular and React. I help figure out ways to make other software engineers happy too. Sure, I do budgets and I sit in plenty of meetings. However, I still love everything about the act of creating software and keep my hands (and my heart) in it. It’s what got me here and it’s what helps me remain a vital part of the team at GCC and The Home Depot. What more can an old coder (and entrepreneur) want?

What is so special about Global Custom Commerce (a Home Depot Company)?

IMG_6613

Here’s a Q&A I did for the company website a couple years ago. I must admit that I feel even more strongly about this place now than I did then. If you are interested in joining the team, check out our jobs site.

Q: What was your “Aha” moment that made you choose to join Global Custom Commerce (GCC)?

After I sold my software company, my original plan was to do a little consulting while I took some time to figure out my next entrepreneurial venture. One of my clients was GCC. During that time, I got to meet lots of great people and saw how special it was. For me, it wasn’t one “aha” moment. It was more a process of figuring out that GCC was a place where I could be my entrepreneurial self without having to start another company. I liked GCC so much that I volunteered to defer my contracting invoices for a couple months around an investment round so I could stick around instead of moving on to other opportunities. Shortly after that, I became a full-time member of the team. It’s the best move I’ve ever made.

Q: Which of the core values influences you the most in your life and why?

Improve continuously hands down. I’ve been a software developer for more than 30 years and have had to re-make myself more times that I can count. It’s great to work in a place that emphasizes what I think has been one of the key difference makers in my career.

Q: What’s some advice you’d give to someone who is considering being a part of the team?

Be passionate about what you do and can bring to the team. We’re passionate about what we do and we’re looking for the same thing in everyone that joins us.

Q: What experiences have shaped you the most as you enjoy the GCC ride?

Our CEO and Founder, Jay Steinfeld, stood up in front of the whole company and said the Autobahn team showed “true grit” as we fought through challenges to finish the Home Depot launch on time. That moment was almost indescribable for me and something I will never forget. It perfectly expressed what I though made the team successful and also why I was so proud to be part of it. For me, enjoying the ride is not about ping pong, cake or dressing up, though I do enjoy those things. For me, enjoying the ride is about building things and sharing the joy and pain of creation with other great people. Those simple words from Jay convinced me that I was understood and valued.

Q: Describe your relationships with other GCC associates. What’s the environment like?

I’m a programmer at heart so I’ll speak to development culture here. First of all, I get to work side by side with some brilliant developers and I’m happy to say I learn something every day. Everyone, and I mean everyone, has a voice in architecture and design. Debates are brief, passionate and productive. We’re truly agile and we get stuff done. Our business is fun to work with too. They are super-creative and intimately involved in the development process. When things go wrong, they are supportive and trust that we will work our tails off to make things right. Finger pointing is never part of the equation. The focus is always on finding a solution that works.

Q: How would you describe GCC in three sentences or less?

I need just one word: Awesome

Interview on DevOps

From time to time, I get the opportunity to talk to industry reporters about agile and DevOps. Today, I was interviewed via email for the first time, which turned out pretty interesting. Here are the questions and answers from that interview.

Please briefly describe how the company is using DevOps, including when it began, which DevOps tools and for which types of projects.

We see DevOps as a culture that encompasses people, practices, tools and philosophy. In that sense, it has become central to everything we do to develop, maintain and operate our e-commerce sites for Blinds.com, JustBlinds.com, AmericanBlinds.com and, of course, Home Depot custom window coverings. Infrastructure is code that evolves in concert with our other software components. DevOps happens inside our agile development teams and often draws in specialized resources from our operations group. It also happens inside our infrastructure group and often draws in developers. It’s part of our DNA.

The tools aspect of it is pretty standard stuff. We use Git and GitHub for source control. All our application and infrastructure code is there. Puppet helps us with rolling out and managing servers. Our backends are mostly .NET so we use Octopus Deploy to help with rolling our code. TeamCity is in the middle of our development process and code there is used to expose deployments and tie them together with builds. Logs are mostly managed by Splunk though we’ve played with an ELK stack for this as well. Nagios is used for infrastructure monitoring. NewRelic is our app monitoring tool and we depend on it to alert us to problems with the user experience. All our alerts get fed into Pager Duty for escalation management. We’ve been experimenting with Consul for discovery and config.  We’re also experimenting with Docker. What’s holding us back there is .NET on Windows. Of course, that story is changing with .NET Core and Windows 2016 on the horizon so we have high hopes for Docker as a next step.

What were the business drivers for deploying DevOps?

Agile drove our adoption of DevOps. Our adoption of agile was driven by our organization’s culture more than anything else. One of our key values is “experiment without fear of failure”. Another is “improve continuously”. Over the years, our whole IT process had gotten into that uncomfortable place where limited resources lead to a difficult relationship with the rest of the business. They saw us as standing in the way of all the cool experimentation and improvement they wanted to do. Agile helped us break down the walls that had developed and form a true partnership for innovation. DevOps is a necessary part of the agile process. How can you innovate constantly if deployment requires an over-the-wall handoff and lots of manual intervention to get done? If operations and infrastructure are not intimately involved in the process, how can you support and manage it once it gets into production?

What benefits has the company seen from DevOps? 

DevOps enables agile, which allows us to continuously improve. It’s a big part of how we were able to deliver on all the promises of our new e-commerce platform, which lead directly to the acquisition by Home Depot. It has allowed us to continue to innovate and thrive inside a Fortune 50 corporation and take on new challenges to help drive innovation outside of the custom window coverings business.  DevOps is like oxygen for the agile process. Without it, it’s very possible that we would have ended up with “agile in name only” where agile terminology is used but nothing really changes and the organization doesn’t see the kind of exponential increase in innovation that we’re benefited from here.

Any challenges of deploying and using DevOps, and how were they addressed?

Our biggest challenges revolve around security and compliance especially now that we are part of one of the largest retailers in the world. We’re still learning how to deal with all that when it comes to sharing responsibility for deployment and infrastructure between developers, infrastructure and operations engineers. We’re constantly tempted to solve these problems with handoffs and work hard to avoid that. Now that we have trust across all the impacted groups it’s much easier to work through them and come up with ways to address compliance without undermining the velocity of innovation.