We all see the world changing around us as intelligent machines devour data about us and everything we do to offer personalized services at a low cost and immense scale that was unimaginable 20 years ago. We are clearly on the cusp of a new economic age and this time knowledge workers and managers see their jobs on the line. What will the future of work look like? Are we destined to see tens or even hundreds of millions of people across the developed world wholly dependent on government to pay their bills? Will all the accountants be replaced by highly intelligent machines that do their job faster, more accurately and at far lower cost than the humans do it today?
This book offers up some very interesting answers as well as specific advice on how to help your company leverage systems of intelligence to lower operating costs, improve services and win this next wave. It offers an optimistic view of the future where intelligent machines become trusted assistants that allow humans to rise to new levels of performance. All of it is grounded in a pragmatic understanding of past economic transitions and how they impacted business and the work people do to earn a living.
I definitely recommend this book to anyone interested in how big data, AI and automation can transform a business. I also recommend the book to anyone engaged in knowledge work or thinking about entering almost any white collar field of work. You need to be ready for the future and this book will help get you thinking about what is required. Check it out on Audible.
The story of Palmer Luckey, the visionary genius behind the company, and Oculus, the company that made VR real, makes for fascinating reading (or listening). The first two thirds of the book tells an interesting tale for sure, but more or less a conventional one. A lone genius toils in relative anonymity and without much recognition or pay for years with a single-minded determination to change the world. In this case, his vision was to bring affordable VR to everyone. And then, through a combination of circumstance and a bit of luck, he bursts onto the scene thanks to help from some other brilliant serial entrepreneurs to start a revolution so visible and so exciting that Facebook comes along and buys the company for around $2B dollars making Luckey very rich. It also gave Oculus the financial and business muscle it needed to bring their product to market and make VR a huge success at last.
But then Luckey made a huge mistake. He was always a free thinker and a libertarian at heart, and, as he looked at the 2016 Presidential candidates, he decided he could not support Hillary, the darling of Silicon valley, and found the most to like about Donald Trump. He did so quietly for the most part, but he also donated some money to a PAC that put up a billboard. A series of inaccurate press reports led to a suspension that lead eventually and inevitably to his firing from Facebook.
Surprisingly, the book ends up telling a cautionary tale about how the culture at companies like Facebook no longer tolerates anything like free speech or thought, destroys lives and poses a danger to the long-term health of our country. It’s important to note that Luckey did not do or say anything racist, sexist or illegal other than support a presidential candidate not favored by the majority of Facebook employees in a state that makes it illegal to discriminate against or fire employees for their political views. It was sad and shocking to read though not surprising given the environment we all live in today.
Superhuman, the $30 per month Gmail client, has slowly but surely changed my email habits for the better. The interface is clean and simple with a strong emphasis on a zero inbox workflow driven by hot keys that let me power though emails quickly and with confidence. It’s trained me to use my inbox as a todo list — handle what I can now and set reminders on the rest so I can deal with them at the appropriate time.
On the downside, there’s no Android client yet and I’m not dependent enough on the Superhuman workflow to justify switching to iPhone after using Android for the last ten years or so though I am getting curious. It’s not feature-rich by any means and it really doesn’t do anything Gmail can’t do. Except, of course, for the hotkeys and the strong opinions it brings to the table to drive a certain set of behaviors that clearly benefit people with busy inboxes and work lives.
Is it worth $360 per year? That depends. If your inbox is busy, I think it is a slam-dunk. If you are a busy executive with time always at a premium, I have zero doubt. Would it help a busy customer success or technical support rep handle a busy inbox? Very possibly. On the other hand, if you are a programmer or some other hands-on knowledge worker where email is secondary to your daily job, you are better off training yourself to check email only twice per day as part of your routine to get more benefit for less cost.
I am working with Tim Coonfield to develop a talk for the one day Agile Shift conference scheduled for April 12, 2019 in Houston, TX titled “10 Biggest Mistakes We Made on Our Agile Journey (and Why We are Glad We Made Them)“. This is the ninth in a series of articles that Tim and I will use to explore some of those mistakes and what we learned from them. You can find other articles in this series as well as some of the background about our agile journey here.
Around noon, you and your team go to lunch at your favorite restaurant. The waiter walks up to your table, turns to you and asks, “May I take your order?”.
You order the cheeseburger with sweet potato fries. Your teammates order as well. The waiter leaves, brings back drinks, returns later with your food, checks in with you to see if you need anything from time to time, brings the check and so on. If the waiter does his job well and the chef prepares your food properly, you are happy with the experience and leave a nice tip. If not, you reduce your tip and might even complain to the manager.
Your lunch experience is not terribly unlike how stakeholders experience agile in many cases. The business identifies a feature they need to solve a problem they have. For example, a stakeholder might ask for the ability to search customers by last name. Your PO turns that into a story that gets prioritized and assigned to your team. Your team picks up the story and gets it done. If your team collaborates well with the business, you probably check in with your stakeholders as you work on the story to get questions answered, show mockups and generally refine your approach. At sprint review, you show the stakeholders what you’ve produced and they are usually happy enough to accept the story as completed. If not, they provide feedback and might even complain to your manager.
However, the model above is a terrible way to work and ends up undermining the team’s ability to deliver business value. When developers take orders, as the old joke goes, they end up delivering exactly what was asked for but not what was wanted or needed.
Our teams fall into order taking mode from time to time, but they usually pull themselves out of it by simply reminding themselves and the business that they need to understand the problem that needs solving rather than the way the stakeholders would like to see it solved.
In the example above, customer service needed to find the customer record for people calling in on the phone. Once the team understood the problem, they proposed a comprehensive search that would allow the customer service rep to lookup the customer by phone number, email address or name. Although it cost about the same to implement as the simple name search, it actually solved the underlying problem more completely since phone numbers and email addresses were more likely to return the single customer record the service rep wanted. It also opened the door for more sophisticated features in the future. For example, once we tie in the phone system and our CRM system, the same search API the team developed for this feature can be used to automatically pop the customer information onto the screen as soon as the customer service associate answers the call.
There’s nothing unique about how our teams avoid order taking mode. Truly, it is Agile 101. The INVEST model for user stories calls this making stories negotiable. The idea is to leave room for the developers to negotiate the details of the implementation with the stakeholders. In this model, the PO works with the business to understand the needs, provides them to the developers in the form of a story and then the team collaborates with the business to converge on a solution.
However, Agile 101 is not good enough. In practice, teams want some certainty around stories before they are willing to estimate them. In addition, stakeholders require more detail around screen layout and workflows before deciding on what stories to prioritize. Therefore, POs end up working ahead with designers and stakeholders to flesh out the details. Inevitably, implementation details leak into stories in the form of screen mockups and more specific acceptance criteria. By the time the team sees the story, the business rules are cast in stone even though technical details remain perfectly negotiable.
In traditional agile fashion, we tried to fix this by asking developers to spend more time grooming stories with the business. In practice, this meant select developers would sit in on design sessions. This gave them the opportunity to inject the technical viewpoint on stories. For example, suggesting a popup screen instead of a new web form because it would make the flow easier and faster to implement.
Our teams got pretty comfortable with this. It seemed to fit the agile model. The business decided on the business stuff. The technical people decided on how to solve the problem technically. It was efficient because everyone stayed in their comfort zone to deliver working software that solved the problem at hand, or so it seemed.
Unfortunately, when we looked closer we realized things weren’t running quite as smoothly as we thought. Instead of being part of the decision making process on the most important business aspects, developers tended to accept business decisions as gospel and focused entirely on the technical implementation details. Although this seemed efficient, it undermined our ability to innovate because we were not truly leveraging all the intellectual capital our development team could bring. Far too often we focused too much on building features that the business wanted and thought they needed instead of finding innovative ways to radically improve our business. The same artificial divide between technical concerns and business concerns handcuffed our stakeholders and limited their ability to drive different thinking around implementation. Because everyone stayed in their comfort zone, we were having trouble actually driving innovation.
Over the last couple of years, we’ve changed that in a big way by focusing more on building truly cross-functional teams of business and technology experts that work together to define, prioritize and deliver innovation.
Part of the solution is process. For example, our exploratory division has a fire team structure that combines one or two development teams with a dedicated leadership group that includes a PO, a UX expert, a technical leader and a business decision maker that work together to deliver a comprehensive solution for a big business problem given audacious goals and some basic constraints around budget and timeline. One of those fire teams is currently focused on revolutionizing the way we sell decks and decking materials online and in-store. We are also using the OKR framework from “Measure What Matters” to frame the objectives and how we measure results.
The bigger part is culture. For us , this starts with encouraging people to ask questions they wouldn’t normally ask. These days, it’s not unusual for one of our business leaders to ask as a developer rather technical questions about a proposed implementation or for a developer to probe a business person for data to support a proposed story. A couple years ago, these kinds of questions weren’t asked because people naturally stayed on either the business side or the technical side. These days, we actually demand cross-functional debate.
Every initiative that we work on in a truly cross-functional way ends up delivering pretty spectacular results. For example, our core business used a cross-functional effort to double our services business over a period of 90 days. Furthermore, pretty much everybody involved ended up broadening their skill set. In essence, our developers are becoming better business people and our business people are becoming more technical. They are also enjoying the work despite the challenge of working hard to deliver on some pretty audacious goals.
Crossing a greenfield is often harder that you think
I am working with Tim Coonfield to develop a talk for the one day Agile Shift conference scheduled for April 12, 2019 in Houston, TX titled “10 Biggest Mistakes We Made on Our Agile Journey (and Why We are Glad We Made Them)“. This is the seventh in a series of articles that Tim and I will use to explore some of those mistakes and what we learned from them. You can find other articles in this series here.
If you are interested in hearing this talk or some of the other awesome speakers and topics that will be covered at the event, you can learn more about the conference and purchase tickets here.
Everything I will share here happened at Global Custom Commerce (GCC), a Home Depot Company, as we developed and improved our 2nd generation e-commerce platform, called Autobahn, designed to make it easy for customers to shop for and buy more complex configurable products and services like custom window blinds, custom windows, flooring and decks. In 2011, when the Autobahn platform started development, GCC was already the #1 seller of custom window coverings online and owned several brands including blinds.com. We were a couple years away from acquisition by the Home Depot and had about 80 employees. The existing e-commerce platform had been in production for a number of years and was still actively being improved by a large team following a traditional project management philosophy using Gantt charts and reporting on percent complete.
The Autobahn project marked a number of firsts for GCC including our first use of cloud hosting at AWS and our first use of agile methodologies. This article highlights one of our bigger mistakes and how we were able to improve as a result.
This mistake pushed me to the edge. I lost sleep over it, I started hating my job and I almost left the company because of it. You need to know that before you read this story. I can even pinpoint the date when all my pent up emotion spilled out in a post I published back in June of 2013. I’ll share that post later. For now, let’s see if time has improved my perspective.
Autobahn was one of those rare opportunities to build something from the ground up. Everybody likes starting a project from scratch. You get to use all the latest tools. You just know your design will be perfect, your code stellar, your tests complete. If you are replacing a system already in use, like we were, you know that your new system will be much better than the buggy, messy thing that nobody wants to work on anymore.
But that was the problem. The existing e-commerce platform that had been developed over about ten years was successfully serving customers across four web sites to the tune of around $100M annually. A team of developers, designers and marketers worked together every day to continually improve that platform, driving up conversion and growing the business at double-digit rates. There was no way our business could afford to stop that team from improving the platform while we waited for Autobahn to have enough capability to replace it. The market was simply too competitive. As a result, the Autobahn effort was planned to run in parallel with a small, dedicated team while the existing team continued to improve our proverbial cash cow.
We called it parity-plus. Autobahn had to do at least as much as the existing system before we could switch over our busy e-commerce web sites to the new platform. On top of that, Autobahn had to do more. It had to be more stable. It had to be cleaner and easier for developers to modify. It had to have better automated testing. It had to live in the public cloud where it could easily scale with our business. It had to be faster. Of course, it had to have tons of new, customer-facing improvements too. Finally and most importantly, it had to have a better and more flexible product configurator capable of selling any kind of complex, hard-to-buy product so that we could easily expand our business to other product categories in the future. It wasn’t exactly required to make coffee. However, we did decide that, at least in theory it would have to be capable of selling everything complex and configurable from blinds to the evil flying monkeys envisioned in “Wizard of Oz”.
We weren’t concerned. We baked parity-plus into the plan as well as into the psyche of our business leaders. The Autobahn pitch was like one of those kitchen gadget commercials you see on TV — it slices, it dices and it’s all easier and faster than ever before. All our stories and demos talked about parity-plus. In fact, our business would often provide an acceptance criteria that said more or less “this has to work exactly like the one in the existing system plus the following”.
You may think I’m joking about blue flying monkeys, but I’m not; they were a recurring theme in our design discussions
We talked about flying monkeys all the time too. When we designed key features that involved how products and configurations worked we would consider how the feature would handle blinds and then asked ourselves if it would work to sell a flying monkey too. As silly as it sounds, I can say with 100% honesty those soaring simians played a key role in our biggest success: The core of the new Autobahn system was capable of handling almost any complex product we could imagine.
The rest of our parity plus strategy was not going so well. Almost immediately, it became clear that the existing system did lots of little things that almost nobody knew about, which made it impossible to deal with those acceptance criteria that said, “does everything the existing system does plus the following”. As a result, the development team started insisting on stories with acceptance criteria to cover the parity-related features. This proved harder that expected as it was difficult to identify all the little things the existing system did and exactly who relied on the resulting capability.
Meanwhile, the company continued to focus most of its energy and resources on improving the existing platform. In fact, the development team for the cash cow platform was larger than the one focused on Autobahn. Of course, this made the Autobahn backlog grow regularly as each new feature added to the existing system would be needed in the new one. To make matters worse, the necessary focus on driving the existing business made it hard to get time from key stakeholders to work on developing Autobahn.
To the credit of all involved, we worked together to try and solve the problem. The Autobahn development team got bigger. Business leaders freed up time to put more attention on the new platform. We even slowed development on the existing platform to some extent by asking hard questions about every new feature and how long it would be in production.
Despite all those efforts, the schedule continued to slip away. By late spring of 2013, our most optimistic estimate put Autobahn at least one year away from achieving parity-plus after almost 18 months in development. This assumed development of new features on the existing platform stopped completely, which wasn’t going to happen. This also assumed that many team members continued working long hours to try and push the effort to parity-plus and launch. Like I said, one year was a very optimistic estimate.
Team morale suffered. Some of us started advocating for radical solutions including killing the Autobahn project. We felt this would free up resources to focus on radically improving the existing e-commerce product to make it capable of delivering on our CEO’s future vision. Although the business carefully considered even our most radical ideas, it was decided to continue on course.
Artist’s depiction of Pickett’s charge, a terrible march towards death across a greenfield on July 3, 1863; the metaphor I chose for my blog post
Many team members, including me, started to wonder if we were on some kind of death march. I expressed my extreme frustration in June of 2013 when I published “When Green Fields Become Killing Fields” on my blog.
But we didn’t give up. The next few months were a blur as we buckled down and tried to get on course to launch at least one website on Autobahn by the end of 2013. To some extent we succeeded. By around October 2013, Autobahn was fully capable of selling custom blinds to customers though it still remained far from reaching the parity-plus goal.
At that point, the development team gathered offsite with out PO, Wade Pinder, to brainstorm how to get Autobahn into production. The discussion centered on two options. Most of the team favored targeting our smallest e-commerce site, Blinds.ca, and convincing the business that we could launch it on Autobahn well short of our parity-plus goal without negatively impacting sales. That group believed we should put a fairly tight timebox on the remaining effort to force everyone to focus on only the most critical features. In agile terms, this group wanted to aim at a minimal viable product or MVP. The rest wanted to maintain our focus on the biggest site, Blinds.com. They thought we could deliver closer to parity-plus given more time and a development freeze on the existing platform. After vigorous debate, the team decided to take the blinds.ca MVP strategy to business leaders . We then worked together to put rough estimates on the remaining stories so we could help the business decide what would end up in the MVP.
A few days later, Wade led a meeting with the development team and all the key stakeholders that he ended up calling the “Red Line Exercise”. He put all the remaining stories up on a board in priority order. Because we had rough estimates, he was able to draw a red line at the six-month mark. The exercise for the business was simple. Stories above the line would be in the MVP. Below the line would not. If somebody felt a story below the line was critical, they moved it up in priority order so it was above the line. However, this would move stories of lesser priority down the list and some would end up below the red line. Some of the resulting trade-offs were easy, others inspired spirited conversation, but ultimately we converged on a solution . By the end of the exercise, we had successfully killed the notion that parity-plus was needed before the first launch and had a path to launch blinds.ca in six months.
It took a little longer that we expected, but we launched Autobahn with Blinds.ca on June 10, 2014 roughly 8 months after the original red line exercise. Although it launched missing a number of key features from the existing platform, blinds.ca revenue actually showed a small year over year increase and conversion improved too. It also gave us a platform to continually improve as we drove towards the launch of our other brands.
Our exploratory arm likes to refer to the POs “golden chainsaw”, which they use to trim feature sets to fit within timeboxes
Over the years that followed, timeboxes became our most effective tool for driving major releases that required multi-sprint efforts before going public. Although we can usually deploy these larger changes incrementally using techniques like feature toggles, we really can’t expose them to the public until they deliver a MVP. We timeboxed our blinds.com launch. A timebox drove the rapid delivery of custom window coverings on Homedepot.com after the merger. We also use timeboxes extensively in our exploratory division to rapidly deliver new product categories and buying experiences both in-store and online for The Home Depot.
If you ever find yourself stuck on a project that is expected to deliver parity-plus or you can’t seem to finish your minimal viable product, you need to introduce a time box. This happens naturally in startups due to financial constraints and is a big reason they often seem to get so much done with so little. Make the timebox smaller than anyone likes and try a red line exercise. Your MVP will likely get smaller, you’ll definitely deliver faster and your customers will end up far happier than anyone believes. Just remember, the MVP is not the end; it is only the beginning of a journey of continual improvement.
Busy bees are fine as long they are not just keeping busy in their specialty
I am working with Tim Coonfield to develop a talk for the one day Agile Shift conference scheduled for April 12, 2019 in Houston, TX titled “10 Biggest Mistakes We Made on Our Agile Journey (and Why We are Glad We Made Them)“. This is the sixth in a series of articles that Tim and I will use to explore some of those mistakes and what we learned from them. You can find other articles in this series here.
If you are interested in hearing this talk or some of the other awesome speakers and topics that will be covered at the event, you can learn more about the conference and purchase tickets here.
Everything I will share here happened at Global Custom Commerce (GCC), a Home Depot Company, as we developed and improved our 2nd generation e-commerce platform, called Autobahn, designed to make it easy for customers to shop for and buy more complex configurable products and services like custom window blinds, custom windows, flooring and decks.
In 2011, when the Autobahn platform started development, GCC was already the #1 seller of custom window coverings online and owned several brands including blinds.com. We were a couple years away from acquisition by the Home Depot and had about 80 employees. The existing e-commerce platform had been in production for a number of years and was still actively being improved by a large team following a traditional project management philosophy using Gantt charts and reporting on percent complete.
The Autobahn project marked a number of firsts for GCC including our first use of cloud hosting at AWS and our first use of agile methodologies. This article highlights one of our bigger mistakes and how we were able to improve as a result.
From the start, the Autobahn development team included engineers with a diverse set of skills. Some team members had T-shaped expertise — deep in a couple areas with some knowledge across the whole technology stack. Some were I-shaped — very deep in one critical technology with little expertise in the rest. Some of us specialized on testing and some of us on building features.
In planning, we worked as a team to estimate stories by breaking down each one into a set of tasks. Usually, the people that specialized in a certain kind of task ended up providing the estimate. Although this meant many planned tasks were only understood by a select few on the team, it didn’t appear to be a problem. Planning remained fast and seemed quite collaborative
Once the sprint started, team members naturally focused on tasks within their expertise. This usually meant starting several stories because there was almost never exactly the right amount of specialized work needed to keep everyone busy on a single story.
We developed a comfortable rhythm. Within the first couple days of the sprint, we’d have four or more stories in process. Around the middle of the second week, we’d start finishing up stories and merging them into a release branch. Late in the sprint, we’d finish up testing the stories and call them done.
It was also very efficient. Because team members worked in their strongest areas, tasks got done quickly. Team members that ran out of work in the sprint would start on stories prioritized for the next sprint. It always seemed like we were ahead because we were all busy bees all the time.
But then we hit a snag. Late in the second week of a sprint we realized we could not finish regression testing the release branch by the end of the sprint. No problem. QA engineers could perform regression testing at the start of the next sprint. After all, they weren’t very busy the first few days of each sprint because the software engineers were busy starting multiple stories.
And then we hit another snag. One sprint one of our specialists ran out of work and, as per our standard operating procedure, starting working on what was prioritized as the first story for the next sprint. Even better it seemed, he actually knocked off the front-end work for the first five stories planned for the next sprint. Unfortunately, most of that work was wasted because the last four of those stories got deprioritized by the business before the next sprint started and eventually fell off the backlog entirely.
That caught our attention. Clearly, we were doing something wrong. Despite keeping all our specialists busy doing what they did best, we were having trouble getting stories done by the end of the sprint.
An agile consultant suggested limiting ourselves to one story at a time. The term we heard at the time for this practice was “swarming”. The entire team would work together to finish one story before moving onto the next. If a specialist ran out of work, she would pair with others to work on tasks outside her specialty or would spend the idle time developing new skills, working on a pet project or helping the team out in other ways.
Swarming worked. The most important stories were guaranteed to complete in the sprint. Collaboration improved too. Since the PO saw completed stories earlier and provided feedback sooner, the team was more open to responding to feedback and making changes within the sprint. Overall throughput improved. That is, stories were getting completed at a steady rate throughout the sprint and delivered what the business needed. In formal terms, we found that decreasing the amount of work in process, or WIP, increased the throughput of our process.
However, swarming didn’t feel efficient. Specialists often ran out of tasks they felt competent to complete on their own. Some were willing to pair and learned new skills along the way, but often felt they were not making the best use of their time. To make matters worse, the people that were experts on the tasks that remained felt like pairing with someone less skilled was distracting and slowed them down.
Increasing the WIP limit helped up to a point. Because it allowed specialists to go faster, more tasks got done. Often, it also meant that more stories got completed in the sprint. In formal terms, we learned that increasing the WIP limit increased our efficiency and our overall throughput up to some maximum value where throughput would peak.
But increasing the WIP limit came with risks. When WIP got too high we started to see some of the same old problems — stories wouldn’t come together at the end of the sprint and would remain incomplete. Sometimes that was OK, but sometimes that unfinished work became waste. We also found that our ability to respond to change within the sprint was compromised when WIP crept up because the PO was rarely able to provide feedback early enough in the sprint.
I’d like to be able to say that we worked through the challenges and found the right balance over the next few sprints. The truth is we didn’t.
Part of the problem was rooted in how we interpreted the Scrum process to require one release per sprint. After all, review happened at the end and we really couldn’t deploy until we had approval from the stakeholders. The focus on one release per sprint made it seem reasonable to merge code late in the sprint and focus on regression testing the entire sprint release in the last couple days of the sprint.
Part of the problem was how we misused velocity to some degree. Pretty naturally, we focused on measuring velocity and used it to judge progress against our overall plan. When our specialists worked ahead on the next sprint instead of sitting idle or working slowly on tasks they weren’t good at, average velocity increased. Unfortunately, our measurements ignored some of the waste we were building up when testing spilled into the next sprint or stakeholder feedback couldn’t be incorporated into the current sprint.
As we grew from one team to twelve, we continued to control WIP with limited success; it kept popping back up as a problem from time to time as teams would slip back into chasing efficiency.
We finally fixed that by changing our culture to focus on deploying increments of functionality as soon as they were ready. This automatically started focused teams on completing one or two stories at a time. The reduction in WIP has resulted in greater throughput, faster feedback and fewer bugs and production problems. Obviously, there’s more to that story, which you can read about in my early article on Bug Fix Thursday.
If you want to learn more about why WIP limits work to reduce waste and improve throughput, I highly recommend “The Goal: A Process of Ongoing Improvement” by Eliyahu M. Goldratt and Jeff Cox, which helped popularize lean manufacturing back in 1984. This skinny book, written in a fast-paced, thriller form, does a great job of explaining some of the core principles of what makes agile effective even though it was written almost two decades before the Agile Manifesto.
I am working with Tim Coonfield to develop a talk for the one day Agile Shift conference scheduled for April 12, 2019 in Houston, TX titled “10 Biggest Mistakes We Made on Our Agile Journey (and Why We are Glad We Made Them)“. This is the fourth in a series of articles that Tim and I will use to explore some of those mistakes and what we learned from them. You can find other articles in this series here.
If you are interested in hearing this talk or some of the other awesome speakers and topics that will be covered at the event, you can learn more about the conference and purchase tickets here.
Everything I will share here happened at Global Custom Commerce (GCC), a Home Depot Company, as we developed and improved our 2nd generation e-commerce platform, called Autobahn, designed to make it easy for customers to shop for and buy more complex configurable products and services like custom window blinds, custom windows, flooring and decks.
In 2011, when the Autobahn platform started development, GCC was already the #1 seller of custom window coverings online and owned several brands including blinds.com. We were a couple years away from acquisition by the Home Depot and had about 80 employees. The existing e-commerce platform had been in production for a number of years and was still actively being improved by a large team following a traditional project management philosophy using Gantt charts and reporting on percent complete.
The Autobahn project marked a number of firsts for GCC including our first use of cloud hosting at AWS and our first use of agile methodologies. This article highlights one of our bigger mistakes and how we were able to improve as a result.
In the early days of the Autobahn project when the team was still a small one with only 4 team members and a PO, our manager established a policy that each of us could and should work from home one or two days each week. Since we were practicing Scrum more or less by the book, the only exceptions were sprint planning day, the first Monday of each two-week sprint, and sprint review day, the last Friday. Most of the team took advantage. I was a notable exception mostly because I had two twin baby boys in the house, which made it difficult if not impossible to do focused coding at home.
The truth is I also hated working at home. I had plenty of experience. For years, I had run a small consulting company and spent a lot of my time working out of the house often on projects with geographically distributed teams. Along the way, I wrote a monthly column for an industry journal and a best-selling technical book on a tool nobody uses anymore (C++ Builder from the long gone, but fondly remembered IDE pioneer, Borland). Maybe it was the technology back in the late 90s, but I always found it more productive to lean over to the person next to me for a quick chat rather than getting on the phone or hopping on the latest flavor of chat.
Anyway, the rest of the team loved it. No getting up early and fighting traffic to make stand-up, no expensive lunches out and no long commutes home. It also was a great time to focus on writing code with no distractions, no background noise and no meetings.
We had all the best technology 2012 had to offer at our disposal. Our CI/CD infrastructure and our test environments were hosted at AWS. Our source code was at Github. Our office network featured a VPN that was already supporting dozens of call center associates that worked at home on a daily basis. The company paid for our cell phones and our contacts were up to date. We had Skype accounts and we were not afraid to use them. We all had fast Internet connections at home too.
Over the next few months a very clear and rather disturbing pattern developed. When everyone was in the office, things moved along very quickly. If you ran into a problem, you talked to the person next to you and solved it instantly. If you had a question for the PO, you stood up, walked 3 feet and tapped him on the shoulder. We used stickies on a whiteboard to track our work, and it was super-easy to walk up there and grab the next task. The team often went to lunch together and talked about architecture, the business and sometimes nothing at all, but always enjoyed the camaraderie.
The work at home days were very different even though they weren’t supposed to be. The day before, whoever wasn’t going to be in the office would be careful to grab a couple tasks and move them into the “work in process” column on our whiteboard. Although the remote person would call into stand-up, she would usually have a very hard time hearing the conversation and, when talking about the work she was going to do today, would struggle to point out the right cards on the board. During the day, the remote person would generally work in a pretty isolated fashion. We rarely spoke to remote workers. Often, when we tried, we ended up leaving a voice mail and got a call back within an hour or so. Pretty much the same thing would happen with Skype.After awhile, it was easier to find someone in the office or wait for the next day. It felt almost like the work at home folks fell into a short-lived black hole where the speed of collaboration fell asymptotically close to zero.
The good news was working code often came out of that black hole thanks to the lack of interruptions, but not as much as we thought. Working at home was harder than people thought. It was far easier to keep banging away at a problem than it was to get a second set of eyes on the code when it involved Skype calls, screen sharing and Internet lag. Every technical glitch and every missed call just made it more likely that everyone would wait for tomorrow to collaborate. Technical whiteboard design sessions just worked more smoothly with a real whiteboard. As a result, work at home productivity did not match what we saw from the same people in the office.
Of course, the team noticed and started talking about it more and more. Two camps formed — those for work at home days and those opposed. We all tried very hard. We experimented with new technologies. For example, we started to use video conferencing on an iPad to try and bring remote workers into the daily stand-up. We also moved to Jira to make it easier for the remote team members to share the task board. It all helped a little, but it was not able to close the gap.
Working in the office was just easier and more productive. One by one, the work at home advocates starting coming into the office more frequently. After a few months, work at home became a rare thing used mostly when a workman was expected to fix something or the kids were off school. We had all come to value face to face interaction and the speed of collaboration it allowed us.
Even today, our teams highly favor face to face interaction. Although most, if not all of them, use electronic tools to track their tasks, they still put various artifacts in physical form on whiteboards. We use Slack extensively, but we talk in person far more. Team members value collaboration so much that they willingly change desks to sit close together with other people working on a shared initiative even if it is only planned to run a month or two. We hold as many development-related meetings as possible in public spaces near the teams so people working at their desks can overhear what is being discussed and join the conversation if they think they have something to contribute. Even when doors are closed, engineers know they can simply walk over and interrupt if something important has come up. All of these things are just harder to do when you have team members working remotely.
That is not to say we never work at home or we don’t sometimes work with geographically distributed teams. Technology, people and business realities all are driving a demand for more and more remote work. We have worked very hard over the last couple of years to remove barriers to remote work. However, collaboration is still easier and more fun when you are in the same space. The benefits gained from remote work, such as better work-life balance and more control over interruptions, typically are outweighed by the tax you pay in collaboration friction. Although the technology has advanced, it simply cannot match co-location.
I am working with Tim Coonfield to develop a talk for the one day Agile Shift conference scheduled for April 12, 2019 in Houston, TX titled “10 Biggest Mistakes We Made on Our Agile Journey (and Why We are Glad We Made Them)“. This is the third in a series of articles that Tim and I will use to explore some of those mistakes and what we learned from them. You can find other articles in this series here.
If you are interested in hearing this talk or some of the other awesome speakers and topics that will be covered at the event, you can learn more about the conference and purchase tickets here.
Everything I will share here happened at Global Custom Commerce (GCC), a Home Depot Company, as we developed and improved our 2nd generation e-commerce platform, called Autobahn, designed to make it easy for customers to shop for and buy more complex configurable products and services like custom window blinds, custom windows, flooring and decks.
In 2011, when the Autobahn platform started development, GCC was already the #1 seller of custom window coverings online and owned several brands including blinds.com. We were a couple years away from acquisition by the Home Depot and had about 80 employees. The existing e-commerce platform had been in production for a number of years and was still actively being improved by a large team following a traditional project management philosophy using Gantt charts and reporting on percent complete.
The Autobahn project marked a number of firsts for GCC including our first use of cloud hosting at AWS and our first use of agile methodologies. This article highlights one of our bigger mistakes and how we were able to improve as a result.
Because he plays such a big role in this story, I’ll be referring to our first PO, Wade Pinder, by name. He played a big role in our agile journey and an even bigger one in this story. Although we have lots of POs these days to serve our twelve agile teams, Wade remains the strongest agilist here at GCC. You can find Wade speaking and coaching agile around Houston and on LinkedIn.
Several years ago, we started having conversations like the following between development teams and Wade Pinder, our PO at the time, often very late in the sprint when the team was rushing to finish up the last few stories to meet their commitment for the sprint:
“Wait a second”, Wade says pointing at the screen. “this is the first time I am seeing the screen and such and such doesn’t work the way we need. I can make some specific suggestions now that we have a screen to look at. Maybe we can get some of the end-users to take a look and help as well.”
The engineer doing the demo grimaces and says, “Well, the story is really done and the requirements weren’t called out in the acceptance criteria. We won’t have time to do any of that this sprint. We can write it into a new story and maybe pick it up next sprint”.
Sometimes the conversation would turn into a more extensive debate. Wade would remind the developers that a well-written story was “a reminder to have a conversation” and the agile manifesto calls for “customer collaboration over contract negotiation”. He would also accurately make the point that the intent of the story was quite clear by the time sprint planning started based on all the conversations that had taken place. Down deep, the development team knew they had failed to deliver on that intent. However, most times a narrow reading of the acceptance criteria gave them the cover they needed. They were acting like lawyers that get a guilty client off on a legal technicality.
The changes needed were generally not huge. They often ended up in what we would categorize as a small or medium story. Equally importantly, these weren’t cases when a story turned out to be too big and sprawling to actually be a single story. It was really the case that the team was not delivering on the intent of the story. It was happening more and more often and resulted in far too many sprints where one or more stories would be “done” but did not deliver the expected business value until the follow-up story was completed and deployed in some future sprint.
After months of this, Wade decided to take action. He decided to write stories that captured every single aspect of everything expected in excruciating detail. Small stories that used to have less than five acceptance criteria, had 20 or more. Every UI change came with a mockup and a clearly-written expectation for pixel-perfect delivery. Every field was described, every validation was specified, expected response times were documented and every error message was spelled out. By the time he was done, every story ran for several printed pages. Each one was like the most cunning contract ever assembled by the most skillful corporate lawyer — completely devoid of wiggle-room, full of landmines and utterly impossible for mere mortals to understand.
The development team was horrified by the more detailed stories when they first saw them in a backlog grooming session. There was no room for creativity and no room for their input. Every detail was locked in. Where was the room for discussion? What if there was a better way? What if there was an easier way?
Then we really started to talk. Wade and the developers saw eye to eye for the first time in months. The developers agreed to show Wade screens and other story artifacts as quickly as they became available to provide time and context for meaningful conversation and course adjustment. In fact, many teams adopted an extra step at daily stand-up to ask out loud “what can we show Wade today”. Wade agreed to go back to writing stories that were reminders to have conversations instead of contracts that specified every detail. Together, everyone agreed to focus on delivering the business value each story was intended to deliver even when it meant missing a sprint commitment.
Agile Lawyers still popup from time to time, but they are easier to deal with now. If it’s a developer, someone that’s been around for awhile simply tells this story and reminds them of what can happen if they force the PO to turn into an agile lawyer. If it’s a PO, it almost always turns out that the PO is reacting to an agile lawyer on the development team and, well, you know that story.
I am working with Tim Coonfield to develop a talk for the one day Agile Shift conference scheduled for April 12, 2019 in Houston, TX titled “10 Biggest Mistakes We Made on Our Agile Journey (and Why We are Glad We Made Them)“. This is the second in a series of articles that Tim and I will use to explore some of those mistakes and what we learned from them. You can find other articles in this series here.
If you are interested in hearing this talk or some of the other awesome speakers and topics that will be covered at the event, you can learn more about the conference and purchase tickets here.
Everything I will share here happened at Global Custom Commerce (GCC), a Home Depot Company, as we developed and improved our 2nd generation e-commerce platform, called Autobahn, designed to make it easy for customers to shop for and buy more complex configurable products and services like custom window blinds, custom windows, flooring and decks.
In 2011, when the Autobahn platform started development, GCC was already the #1 seller of custom window coverings online and owned several brands including blinds.com. We were a couple years away from acquisition by the Home Depot and had about 80 employees. The existing e-commerce platform had been in production for a number of years and was still actively being improved by a large team following a traditional project management philosophy using Gantt charts and reporting on percent complete.
The Autobahn project marked a number of firsts for GCC including our first use of cloud hosting at AWS and our first use of agile methodologies. This article highlights one of our bigger mistakes and how we were able to improve as a result.
In the early days of the effort to deliver our 2nd generation e-commerce platform, Autobahn, we adopted the Scrum methodology and were following all the practices “by the book” including sprint planning, stand-ups, sprint review and sprint retrospectives. However, the company continued to use more traditional QA techniques and processes. As a result, QA engineers were assigned to the team but continued to work semi-independently with their own manager. This obvious mistake was rectified fairly quickly and is not really at the heart of what I want to tell you about here. It’s just important to note since it could be tempting to attribute what came next to where we started with QA engineers sitting half inside and half outside the team.
We also made one large decision around architecture that impacted this story somewhat. After attending Udi Dahan’s distributed architecture course, many on the team wanted to focus on building a microservices architecture with small, loosely coupled components connected generally by asynchronous messaging. Remember, this was 2011 when these ideas were just gaining widespread attention. After consulting with experts, including Udi, we were advised to build a monolithic web application using a more traditional layered architecture to provide separation of concerns. For the first year or so, that is exactly what we did. By the time we started to introduce microservices, there was a pretty significant monolith sitting at the center of our platform. In retrospect, this was clearly a mistake. This certainly contributed to what follows, but was not the sole or even the most notable cause.
When the Autobahn effort started, the business and the development team all agreed that quality was one of the most important things required to make the new platform successful. In fact, the new effort was chartered, in part, based on the promise that quality would be baked into the new platform from the beginning. After all, the company had suffered for years from quality issues on the existing platform and was tired of spending too many cycles fixing problems and not enough time truly innovating.
The team invested in quality from the beginning. Early in the first sprint, we had automated continuous integration including a comprehensive unit testing suite that ran on every commit and failed the build if any tests failed. We also implemented code coverage reporting and focused on achieving as close to 100% test coverage as we could get. The team cared deeply about quality and was fully committed to writing and maintaining unit tests to make sure things worked as designed and continued to work as the code base evolved.
Besides unit tests, our definition of done included an expectation for QA system, integration and regression testing. The QA engineers on the team were responsible for writing test cases based on the stories. Once stories were ready for testing, the QA engineers took responsibility for executing the test cases and recording issues on pink stickies that were added to the physical scrum board maintained in the team area. Software engineers took responsibility for fixing all the bugs the business deemed important in the sprint. Once stories were completed and integrated into the main branch, QA engineers focused on testing for regression using a rapidly growing set of test cases stored in a test case management system.
Within a few sprints, a clear cadence and separation of duties naturally developed within the team. QA engineers would start the sprint trying to automate key use cases from the sprint before. They would also work with the PO to produce a set of test cases for the stories committed in the sprint. Meanwhile, software engineers would start several stories in parallel. By the early part of the second week, software engineers would start finishing up stories with passing unit tests and QA engineers would start UI and integration testing. By mid-week, the team would have all the stories in good shape and would start merging everything into a release branch. Thursday morning we would lock down the release branch so our QA engineers could focus on regression testing and work with the software engineers to make everything ready for review on Friday.
After a few sprints of this, the team started referring to the key testing day in the sprint as “Bug Fix Thursday”. It was a neat way to describe the code freeze that would happen each sprint after the merge completed and regression testing started. Up until Bug Fix Thursday, the team was able to focus on developing new features for the sprint. Starting in the morning on Bug Fix Thursday, the software engineers would generally work ahead on stories lined up for the next sprint if they weren’t busy fixing bugs identified by QA engineers on the team.
Sometimes we had trouble getting stories ready in time for Bug Fix Thursday. Most of the time we simply relaxed the code freeze rule to allow ourselves to add to the release branch later on Thursday, or, in extreme cases, Friday morning. This put a lot of pressure on the QA engineers to either rush through regression testing or to perform multiple rounds of regression testing. It also led to some unhealthy behaviors like allowing regression testing to leak into the beginning of the next sprint. Since the team was small and the platform was not in production yet, we were able to live with some of these problems for quite a while.
As Autobahn gained momentum and the team grew, Bug Fix Thursday got a little uncomfortable. As one agile team grew to two and then to three, we starting feeling the pinch of Bug Fix Thursday more and more often as teams struggled to merge and test all the sprint’s stories in time for the demo on Friday afternoon. Although we introduced more cleanly separated microservices that could be deployed independently, most sprints included functionality that touched the monolithic customer or associate web sites and required extensive regression testing to ensure everything worked as expected. QA engineers felt the pressure the most as regression testing routinely leaked into the following sprint even for stories the team was calling “done”.
Processes were improved to compensate. The one that seemed to help the most was focusing teams on getting more stories done and ready to deploy in the first week of the sprint. This forced teams to work on one or two stories at a time and to make sure they were merged and regression tested before moving onto another story. Although this did not eliminate Bug Fix Thursday, it gave the QA engineers enough confidence to time box regression testing by reducing the number of test cases checked on Bug Fix Thursday.
As we grew from three teams to six and started exploring new business opportunities, Bug Fix Thursday started to get very uncomfortable again. The team exploring new businesses started to release pilot components more frequently, mainly because these systems had very small impacts. However, when they touched critical system components, which was far too often due to the monolithic nature of the system core, their code had to be merged into what was becoming one very big and complex sprint release. The team was also surprised by how these “safe” releases managed to break things in unanticipated ways. We beefed up our unit testing. We added integration tests. We tried adding a QA engineer to float outside the teams and focus on writing more automated UI tests. We brought automated UI testing into the sprint. We challenged our software engineers to work more closely with the QA engineers on the team to finish regression testing at the end of the sprint. We even turned Bug Fix Thursday into Bug Fix Wednesday for a little while to allow more time for regression testing to complete. Some of these changes worked and stuck, some didn’t, but overall the various changes seemed to help us keep Bug Fix Thursday manageable. We got to the point where releases would happen the Tuesday after the sprint and the business was reasonably satisfied.
Behind the scenes, our QA engineers were barely holding things together. They worked long hours on Bug Fix Thursday often testing late into the night. They tested Fridays after sprint review to make sure the release was ready. Testing often continued through the weekend and into Monday. Occasionally, testing could not get done by Tuesday and releases would slip into Thursday and, in extreme cases, into the following Tuesday.
By the time we added our eighth development team, the unrelenting pressure had led us to make a number of quiet compromises on quality. The pressure to finish last sprint’s testing left QA engineers with little time to write and maintain automated UI tests. Because comprehensive regression testing was taking too long, manual regression testing focused on areas the team thought could be impacted by the changes in the sprint and very little time would be spent testing other areas. Because schedule pressure was almost always present, the team did not believe they had the time they needed to clean up the monolithic components so technical debt was growing and it was getting harder to accurately identify the parts of the system that really needed regression testing.
Once we grew to 12 teams, the symptoms were clearly visible to the team and our business. One sprint’s release took so long to test that we decided to combine it with the subsequent sprint’s work into one gigantic release. “Hot fixes”, intra-sprint releases made to fix critical bugs that were impacting our customers, became common. In fact, we were starting to see cases where one hot fix would introduce yet another critical issue requiring a second hot fix to repair.
Finally, the pace of change completely overwhelmed our teams and processes. Release after release either failed and required rollback or resulted in a flurry of hot fixes. In one particularly bad week, the sprint release spawned a furious hydra; Each time we fixed one problem, two more would show up to replace it. By that time, I was leading the IT organization and, after consulting with team members and leaders, I mandated strict rules around regression testing, hot fixes and releases to stop the bleeding.
Simultaneously, we launched a small team of three people dedicated to improving quality and our ability to release reliable software frequently. We named it Yoda. We claimed it was an acronym, but I can’t find anyone that remembers what the letters were supposed to mean. Its biggest concrete deliverable would be an improved automated regression testing suite. We also asked the Yoda team to find ways to simplify the release process and improve the overall engineering culture.
Over the next several months, the Yoda team made progress. As expected, automated tests improved. However, the big improvement came from improvements in the release management process and the culture.
Although by this time the web sites were still pretty monolithic, they were surrounded by microservices that were independently deployable. The teams had also made progress on making aspects of the web sites independently deployable. The Yoda team spent some time documenting the various components and worked with various development teams across the company to determine which were truly independent enough to release on their own and which required more system-wide regression testing. Yoda improved the continuous delivery process and added a chatbot to make it easier for development team members to reliably deploy. They worked with the development teams to make releases easier to rollback too.
Once the Yoda effort gained momentum and the development teams were ready, we relaxed the rules around regression testing and releases for the components that Yoda identified as reasonably separated and safe to release independently. Over the next couple of months, we went from 1 large release per 2-week sprint to over 50 per week. Because releases were smaller, they were easier to test and quality improved. Hot fixes became rare again. Rollbacks occurred from time to time, but, because teams planned for the possibility, did not create the kind of drama we observed in the past.
Process changes were also required. As the number of releases per sprint increased, we realized visible functionality was making it to production before business stakeholders had a chance to formally review and approve it. As a result, teams started to demo stories to stakeholders as soon as they were done and ready to deploy. For some teams, that made the traditional end of sprint review exercise far less useful. Therefore, some teams stopped performing the end of sprint review though they continue to value and practice retrospectives based on the feedback received from the many stakeholder reviews and releases that happen during the sprint. As they work more story by story, teams are gradually starting to look at things like cumulative flow diagrams and cycle times and are starting to experiment with other agile methodologies, such as Kanban.
And so Bug Fix Thursday lived and mostly died within our agile process. At times, it served us well. At times, it reflected problems in our process or our code. At times, it created additional problems and raised stress levels. The solve, though obvious in retrospect, was terribly counter-intuitive especially in a world where the codebase includes some critical monolithic components: Create and nurture a culture that values releasing more and more frequently. Smaller, more frequent releases make testing easier and the risks smaller. Independently testable and deployable components are an important part of the story, but don’t do much good without the commitment to release more frequently. Although we had always talked about it and even built much of the necessary infrastructure to support it, we never brought it into focus until we launched Yoda and truly changed our culture.
Unlike some of the other stories in this series, we’re still not quite done with Bug Fix Thursday. We just found a way to make it smaller and insured that it can’t get any bigger by limiting its impact to the monolithic pieces of our system that are left over from the early days of the Autobahn platform. We’re also committed to shrinking it further over the coming months by focusing a small team, called Supercharge Autobahn, on breaking down the highly complex remaining pieces of the original monolith into truly independent components. We also continue to work on our engineering culture to make sure we don’t backslide.
I am working with Tim Coonfield to develop a talk for the one day Agile Shift conference scheduled for April 12, 2019 in Houston, TX titled “10 Biggest Mistakes We Made on Our Agile Journey (and Why We are Glad We Made Them)“. This is the first in a series of articles that Tim and I will use to explore some of those mistakes and what we learned from them. You can find other articles in this series here.
If you are interested in hearing this talk or some of the other awesome speakers and topics that will be covered at the event, you can learn more about the conference and purchase tickets here.
Everything I will share here happened at Global Custom Commerce (GCC), a Home Depot Company, as we developed and improved our 2nd generation e-commerce platform, called Autobahn, designed to make it easy for customers to shop for and buy more complex configurable products and services like custom window blinds, custom windows, flooring and decks.
In 2011, when the Autobahn platform started development, GCC was already the #1 seller of custom window coverings online and owned several brands including blinds.com. We were a couple years away from acquisition by the Home Depot and had about 80 employees. The existing e-commerce platform had been in production for a number of years and was still actively being improved by a large team following a traditional project management philosophy using Gantt charts and reporting on percent complete.
The Autobahn project marked a number of firsts for GCC including our first use of cloud hosting at AWS and our first use of agile methodologies. This article highlights one of our bigger mistakes and how we were able to improve as a result.
In the early days of the effort to deliver our 2nd generation e-commerce platform, Autobahn, we adopted the Scrum methodology and were following all the practices “by the book” including sprint planning, stand-ups, sprint review and sprint retrospectives. However, the company continued to use more traditional project management techniques and reporting processes. As a result, the team was required to work with the PMO to keep a Gantt chart updated with progress by mapping completed stories to estimates of percent complete (i.e. a mistake I’ll cover in another article). The team was also concerned that, even though the CEO was shepherding the project himself, many leaders in the company were not happy about the company investing resources to build a new e-commerce platform instead of investing more in the existing one. There was also the issue of racing to replace the existing platform, which remained under active development since nobody in the company was interested in moving our e-commerce business to the new platform until it was more capable than the existing one. Despite these challenges, the team was optimistic that we would deliver on time and within budget.
For the first few months things appeared to be going well. Every sprint we delivered exactly as promised. I know the Scrum guide talks about forecasts these days, but back then the book called for commitment — a very clear promise from the team to the business that they would deliver what they said they would deliver at planning. As the project started to gain momentum, we shifted focus from basic CRUD to the critical functionality required to sell any sort of configurable product or service.
As the required functionality got more complex, we started to have more difficulty delivering the way we intended. However, we could usually figure out a way to get enough done to demo by interpreting the story narrowly or, in some cases, by pushing the PO to pull out things that we convinced ourselves were not truly critical into new stories to be tackled in future sprints. At times, a couple of us also worked crazy hours over the weekend to make sure things got done as planned. The good news, we thought, was that velocity kept increasing so there was no doubt that we could tackle all those new stories and still deliver according to the original plan. We certainly thought that the combination of increasing velocity and the on-time trend shown in the project plan would make us look good to the business and help us keep our project alive.
Then came THAT story. It’s not really important which story exactly or what it was expected to deliver. What does matter is that after the first week of our two week sprint we knew we were in trouble and we knew THAT story was the problem. As per usual, a couple of us decided to work over the weekend to break through the hard bits so the team could finish wrapping it up in the second week of the sprint. Sunday night we still had not broken through. In fact, we had started to recognize that we had more work ahead of us than we had believed back on Friday evening.
The following Monday, the whole team got together after daily stand-up to talk about THAT story yet again. The team members that worked over the weekend shared details about the issues they had solved and the new issues they had uncovered. Nobody seemed overly worried. After all, we were the team that always delivered and failure just wasn’t an option. We spent a hour or so breaking down the remaining work into a set of tasks and put them up on the board. We updated our burn down and were pleased to note that we were still going to deliver THAT story, and everything else we had committed to, within the sprint even if it was going to require a little extra effort along the way.
Late in the day on Monday, I recognized I was in for a long night. As I stood up to stretch, I noticed other team members around me still hard at work and realized they were likely behind on their tasks too. A few hours later, I realized I was simply staring at the screen and was not really accomplishing much so I packed up to head home with a plan to start fresh early the next morning. The rest of the team appeared to be in about the same place as they headed out after a very, very long day.
It was pretty much the same story Tuesday and Wednesday. Despite lots of conversation, creative re-planning each morning and long days trying very hard to finish tasks, we just couldn’t seem to break through into the home stretch. Left unsaid was a rapidly growing sense of dread — THAT story wasn’t getting done.
Thursday morning we spent significant time after stand-up focusing on how to trim THAT story back, move bits out or otherwise bend the fabric of reality to allow us to call it done. Our newest team member then said what desperately needed saying, “THAT story is not getting done this sprint. If we keep wasting time on it, we won’t finish up testing the other stories in the sprint either.” He then advanced a heresy we had not even allowed ourselves to consider, “Perhaps,” he said, “we should fail with honor. We tried our best and we’ll get everything else done this sprint”.
We stood there for a moment and silently asked ourselves essentially the same question: how will the business react and how will we explain it? We started to discuss the idea and very quickly got past the fear. We would simply acknowledge the story wasn’t done and that it could go into the next sprint if the business still wanted it at the top of the priority list. We wouldn’t make excuses and we wouldn’t talk about percent complete. Our direct manager, who coded with the team, was very uncomfortable with the idea and did not want us to do it at first. He knew the political environment and worried about the backlash. In the end, he came around and the team decided to put THAT story aside and make sure the rest of the stories in the sprint were truly done before the demo.
I got the dubious honor of demoing the last completed story and discussing the failed one. Despite all the brave talk the morning before, I distinctly remember the little flutter in my gut as I finished up demoing the last completed story. I remember staring at the ugly unfinished story on the projector screen as I said, “and THAT story did not get done”.
I took a deep breath and just a second passed before one of the business leaders asked the obvious question, at least the obvious question for someone used to reviewing projects using Gantt charts clearly illustrating percent complete. She asked, “So how close is it to done?”
I delivered the answer, carefully vetted with the team and rehearsed in advance, four simple words designed to explain our commitment to agile principles and to clear communications and transparency, “It’s simply not done”.
Of course, there was more to say after that. It started with a discussion about why percentage complete is an illusion and quickly moved into a group effort to figure out the best way to move forward. After a few minutes, I realized everyone there, the developers and the business, were engaged in looking for solutions and not spending any time at all assigning blame.
None of us should have been surprised, though we were. After all, one of our four company values is “experiment without fear of failure”. In essence, that’s just another way of saying that sometimes you’ll stumble and that’s OK as long as you learn and improve based on the experience. We saw it and lived it that day for sure. It was the beginning of a true partnership between the team and the business to figure out how to really deliver the benefit of a new e-commerce platform to our customers.
It’s also between the lines in the Agile Manifesto. All of the unintentional spin and irrational optimism inherent in those pretty Gantt charts showing tasks 80% done gets pushed aside by a perfectly clear idea: What’s done is done. Transparency from a development team is critical to actually making informed decisions about what to do next. “Responding to change”, one of the four values in the Agile Manifesto, is as much about what you learn on the technical side as it is about reacting to evolving business realities. The day our team and our business embraced that simple idea was one of the most important days in our agile journey.
You must be logged in to post a comment.