Interview on DevOps

From time to time, I get the opportunity to talk to industry reporters about agile and DevOps. Today, I was interviewed via email for the first time, which turned out pretty interesting. Here are the questions and answers from that interview.

Please briefly describe how the company is using DevOps, including when it began, which DevOps tools and for which types of projects.

We see DevOps as a culture that encompasses people, practices, tools and philosophy. In that sense, it has become central to everything we do to develop, maintain and operate our e-commerce sites for Blinds.com, JustBlinds.com, AmericanBlinds.com and, of course, Home Depot custom window coverings. Infrastructure is code that evolves in concert with our other software components. DevOps happens inside our agile development teams and often draws in specialized resources from our operations group. It also happens inside our infrastructure group and often draws in developers. It’s part of our DNA.

The tools aspect of it is pretty standard stuff. We use Git and GitHub for source control. All our application and infrastructure code is there. Puppet helps us with rolling out and managing servers. Our backends are mostly .NET so we use Octopus Deploy to help with rolling our code. TeamCity is in the middle of our development process and code there is used to expose deployments and tie them together with builds. Logs are mostly managed by Splunk though we’ve played with an ELK stack for this as well. Nagios is used for infrastructure monitoring. NewRelic is our app monitoring tool and we depend on it to alert us to problems with the user experience. All our alerts get fed into Pager Duty for escalation management. We’ve been experimenting with Consul for discovery and config.  We’re also experimenting with Docker. What’s holding us back there is .NET on Windows. Of course, that story is changing with .NET Core and Windows 2016 on the horizon so we have high hopes for Docker as a next step.

What were the business drivers for deploying DevOps?

Agile drove our adoption of DevOps. Our adoption of agile was driven by our organization’s culture more than anything else. One of our key values is “experiment without fear of failure”. Another is “improve continuously”. Over the years, our whole IT process had gotten into that uncomfortable place where limited resources lead to a difficult relationship with the rest of the business. They saw us as standing in the way of all the cool experimentation and improvement they wanted to do. Agile helped us break down the walls that had developed and form a true partnership for innovation. DevOps is a necessary part of the agile process. How can you innovate constantly if deployment requires an over-the-wall handoff and lots of manual intervention to get done? If operations and infrastructure are not intimately involved in the process, how can you support and manage it once it gets into production?

What benefits has the company seen from DevOps? 

DevOps enables agile, which allows us to continuously improve. It’s a big part of how we were able to deliver on all the promises of our new e-commerce platform, which lead directly to the acquisition by Home Depot. It has allowed us to continue to innovate and thrive inside a Fortune 50 corporation and take on new challenges to help drive innovation outside of the custom window coverings business.  DevOps is like oxygen for the agile process. Without it, it’s very possible that we would have ended up with “agile in name only” where agile terminology is used but nothing really changes and the organization doesn’t see the kind of exponential increase in innovation that we’re benefited from here.

Any challenges of deploying and using DevOps, and how were they addressed?

Our biggest challenges revolve around security and compliance especially now that we are part of one of the largest retailers in the world. We’re still learning how to deal with all that when it comes to sharing responsibility for deployment and infrastructure between developers, infrastructure and operations engineers. We’re constantly tempted to solve these problems with handoffs and work hard to avoid that. Now that we have trust across all the impacted groups it’s much easier to work through them and come up with ways to address compliance without undermining the velocity of innovation.

Thumbs Down on WalkMe

WalkMe is one of the leaders in the market for SaaS tools that let you add guided tours to your website.  The product itself is quite good.  I’ll even say it is better than much of the competition I looked at, including tools like TourMyApp.  Unfortunately, unlike most SaaS offerings, they do not advertise pricing on their website nor did they give me more than 20 minutes to evaluate the product before I received a phone call with a high pressure sales pitch.  I quickly found out why.  We are planning to put walkthroughs on our back end administrative site where we will have about 100 users and a couple dozen walk-throughs.  WalkMe pricing for that scenario starts at $12,000.    When that is compared to the $75/month ($900/yr) that TourMyApp costs for 10,000 tours a month you start to understand why WalkMe has such aggressive salespeople.

WalkMe clearly has more features than much of the competition.  It’s sort of like the difference between a BMW 3 series and a Ford Fusion.  They are about the same size and will both get you to your destination.  The BMW does it with more style and has several nicer features you can’t even get in the Fusion.  The BMW costs quite a bit more than the Fusion too.  As always, you get what you pay for.  However, WalkMe’s prices itself more like a Bentley.  Unfortunately, it is not nearly that much beyond its competition.   I am walking away from WalkMe.

Brief Review of Macbook Pro with Retina for Windows Development

I recently made the switch from using a Windows box for my everyday development tasks to a Macbook Pro 15″ Retina.  I’ve gotten a few questions from Windows developers I know about things they’ve heard about blurry displays in Windows VMs, slow Mac performance when running VMs and other unpleasantness.  Another side question is whether it is best to use VMWare Fusion or Parellels.  I figured I’d take a minute to write down what I’ve learned while it is fresh in my mind.

I’ve been developing on Windows VMs for several years now.  I generally keep my productivity stuff in the host and put my development environment in the VM.  This lets me snapshot and restore the development environment easily.  It also lets me experiment with upgrades.  I always develop with multiple screens.  I used to insist on three when I was stuck with 1920×1080 but now that 27″ monitors featuring a resolution of 2560×1440 have become affordable, I am quite comfortable using the laptop as one screen and the 27″ as a second screen.  When I’m writing Windows code, the development VM runs full screen on the big monitor while I use the laptop screen from the host to look up documentation, handle email and do other office productivity tasks.  I usually give the development VM half of the host machine’s memory and CPU.  For the last couple years, my hosts have all been I7 quad cores with at least 16gb of RAM and the fastest SSD possible so VM performance has been snappy.  It’s not as fast as the host, especially when it comes to disk-intensive operations like compiling applications, but it is still faster than working directly on a host with a traditional hard drive.

I purchased a mid-2012 Macbook Pro with a 2.7Ghz I7, 16GB of Ram and a 750GB on board SSD in May of 2013.  I got some discounts since it was near the end of the product cycle, but it still cost about 20% more than a roughly equivalent 15″ laptop from Dell.  The Dell in question has a faster, 3.2Ghz processor and a smaller, 512GB SSD.  Like the Mac, it does not have a touch screen.  Of course, it also has a much lower resolution screen and is a bit heavier and thicker.  I’m not doing a comparison review here, but it is important to note that you pay a premium for the Mac’s design and you get significantly less in raw specs.  What you get in return is a far better user experience with a crisper display, a better, more usable touchpad and superior battery life.  I also purchased the Macbook because it is a better platform for work on things like Node.js since the underlying OS is a Unix derivative.

I first tried VMWare fusion.  It installed easily and guided me through the setup of a Windows 8 VM in minutes.  It starts out in a scaled mode that basically doubles pixels on the Retina display giving me a perfectly usable experience with sharp text in things like Visual Studio.  When I moved the VM to my 27″ monitor, the host re-scaled giving me more screen real estate while maintaining sharp text and graphics.  After I manually increased the guest’s memory to 8GB and gave it four cores, performance was a little better than what I was seeing when hosting on my big Windows desktop (3.06Ghz quad core I7, a fast SSD and 24GB of Ram).  Visual Studio running Reshaper with code analysis turned on performed well.  Compiling and running all the tests on my main work project was about 15% faster than what I was seeing in my old hosting environment.

I tried out VMWare’s Retina mode and that’s when things got a bit ugly.  The idea here is to let the Windows guest run at full resolution on the Retina display.  It looks crisp, but everything is just too small to read.  As recommended by VMWare, I turned up DPI settings in Windows and that’s when I started seeing the blurriness that some of my friends mentioned.  At 125% DPI, everything in Windows was sharp but still way to small for my taste.  At 150% DPI, menu bars and other navigational elements were barely big enough to use but I started noticing blurriness in graphical elements .  This is because Windows applications are not developed or tested to work at high DPI levels.  At 200% DPI, text was good, but things really started to break down.  For example, maximizing Chrome lost the title bar.   I probably could have gotten things working reasonably using 125% DPI and then tuning text sizes and zoom levels of various applications but it was just too much work.  Furthermore, turning up the DPI and font sizes in Windows made Windows applications appear way to large when running in Unity mode.

I had a couple gripes with VMWare Fusion.  Their choice of hot key mappings for Windows 8 has lots of annoyances.  Reaching for what you think should be search for applications shuts down the VM.  Unity mode, which lets you see host windows and guest windows side-by-side, is a little clunky.  On two occasions VMWare froze and forced me to reboot the host.

My experience with Parellels and Retina were about the same.  Setup was a bit easier.  Performance was a bit better especially on compiles.  The display modes were roughly equivalent with scaled mode the best choice when you are running with the laptop screen and an external, non-Retina, display.  Their side-by-side mode, coherence, is much nicer than the one in VMWare.  It never crashed on me.  Overall the app seems like a much better Mac citizen.  It costs more than VMWare, but the benefits made it worth the extra cost for me.

My bottom-line is simple: A Macbook with Retina running a Windows VM using Parellels is an excellent choice for Windows developers.  If you are interested in things like Node.js and even Javascript, it also gives you quicker and easier access to the best open source tools and libraries often long before they get ported to Windows.  The hardware is a little pricey, but the value you get is well worth the extra cost.

 

Why My Team Gave Up on NCover4

This is the third of several posts based on my experience using NCover4 on a large, new development project.  Click here to see all the updates in order.

After struggling with problems caused by NCover4 for nearly a year, my team finally gave up on it on New Year’s Eve.  The writing was on the wall for a couple of weeks.  NCover4 Code Central had gotten so slow that it was completely unusable.  Chrome would pop up a dialog saying the server was unresponsive with buttons to kill or wait on the extremely long running request.  Trying to delete history was impossible because it would take 30 minutes plus from the time you pressed the button to delete one 25 record page of history to the time when the screen was ready to delete another.  Support recommended an index fix utility that ended up running in excess of 12 hours that did improve things a little.  Unfortunately, it still took up to a minute or two to draw one page of results and deletes still took 30 minutes per page.

After looking at the resources NCover4 was using, I moved its data onto a dedicated raid-0 array on our Amazon EC2 server, which roughly tripled the available I/O performance.  This dropped screen redraws to a still unacceptable 30 seconds to 1 minute.  It did not have any measurable impact on delete performance.

The final straw was intermittent test failures at the build server caused by NCover4 hogging resources for 15 – 20 minutes after the completion of the previous build.  If a build started too soon after the previous one finished, some tests involving an in-memory RavenDB would timeout waiting for stale indexes to update.  The second build would also take nearly three times as long as the first thanks to the load put on the server by NCover4.

After discussing the issue with the team, we pulled the plug on New Year’s Eve.  I spent an hour switching us over to JetBrains DotCover.  Although it does not offer the breadth of statistics that NCover4 does and has its flaws, it provides access to the basic code coverage metrics needed to identify and fix poorly covered code.  It is less expensive than NCover4 on the desktop, and, if you use TeamCity, it is free on the build server.  It puts less load on the server as evidenced by a 20% drop in build times.  It does not cause any of our tests to fail even when running builds back-to-back.  Because it is built into TeamCity, it is quite easy to integrate with the build process.  TeamCity also has a nice page that shows trends over time:

Image

The folks at NCover are working to improve performance.  They plan to add automatic history archiving to cut down on the amount of data that needs to be processed to draw their overview graphs.  They also plan to cache the coverage statistics they currently calculate for each page to cut down on the CPU load.  A release including these improvements is expected soon.  However, my team is pushing towards our release and we no longer have time to risk on something we are not sure is production ready.  Therefore, we’re going to stick with DotCover at least until Q4 of this year.  Even after that, NCover4 would have to be substantially better to justify the investment in time and money it would take to switch back.  I cannot recommend NCover4 for any team on any project at this time.

TeamCity 7.1 Branch Builds Rock

I’ve been using TeamCity as my CI server of choice for years now because the folks at JetBrains just keep making it better.  Branch builds are just another example of the kind of thoughtful goodness I have come to expect from these guys.

It’s all designed to fix a problem that occurs when a team takes advantage of the power of a DVCS system like Git or Mercurial.  When a developer starts working on a feature, he or she makes a local feature branch that gets pushed to the main repository periodically.  Once the feature is done, the branch is merged into the main development branch.  Traditionally, the CI server is configured to build the main development branch.  That means developers lose the benefit of all the checks the CI server does whenever they are working on a feature branch.

It gets even worse when you have multiple team members collaborating on feature branches or working on closely related feature branches.   For example, Mark checks in Feature X that Mary would like to merge with her work.  The problem is she has no way to know the code in Feature X passes tests or even compiles before she pulls it down to her local machine.  If it is broken, she ends up wasting valuable time on something she probably should not have tried to merge in the first place.

TeamCity solves this by allowing you to setup your configuration to automatically build all or selected active branches without treating them like they are supposed to be stable.  Although TeamCity shows their status, broken feature branches do not impact the overall project status; As long as the main branch builds and passes all tests, the project status will still be good.

Setup could not be easier especially if you just want to build all feature branches.  You simply go to the VCS root and configure which branches you want to build as shown here:

The specification “+:*” tells TeamCity to build all branches automatically.

Once you do this, you’ll start seeing branch names next to the builds on the main screen.  You can also see a screen with a summary of the state of each branch like this:

Very useful indeed.

TeamCity Professional is free if you have fewer than 20 build configurations.  It includes three free build agents.  If you need more than 20 build configurations or you need Active Directory integrated security, the Enterprise edition is $1,999.  Additional build agents cost $299 each.  You can see the complete price list and license details on the TeamCity web site.

NCover4 Gets a Little Faster (and Hopefully Will Stop Hanging)

This is the second of several posts based on my experience using NCover4 on a large, new development project.  Click here to see all the updates in order.

Yesterday, I installed the latest upgrade to NCover4 (version 4.1.2078.723).  NCover support tells me that their developers found and fixed a thread deadlock that was causing the hanging build issue we observed on several occasions.  They also significantly improved the performance of the screen used to view the list of coverage results for a project.  Now it draws the main part of the screen in about 30 seconds.  Still a little slow, but considerably better than the many minutes it used to take.  I will post a more complete update after using this release for another week or two.

NCover4 — Could Be a Good Product Someday

This is the first of several posts based on my experience using NCover4 on a large, new development project.  Click here to see all the updates in order.

My team has been using NCover4 the last several months and I must say I have mixed emotions.  I think it could be a good product — someday.  The problem is right now it has all the rough edges of an early beta with the high price of an enterprise-ready development tool.

So how about what’s good?  Well, let’s start with Code Central, their centralized web server.  The idea of it is pure genius.  All your test results can go there whether they are run at developer stations, the build server or manually by your QA testers.  It tracks trends.  It tracks line coverage, branch coverage and provides a number of other interesting metrics like CRAP scores.  It lets you drill down into the details of any run.  It’s even reasonably easy to configure.

Unfortunately, like most things in NCover 4, the implementation has some pretty significant flaws.  The web user interface is pretty but ridiculously slow.  We’re talking minutes to finish drawing a page-worth of coverage stats.  Drill downs are not much faster.  Installation with defaults is easy, but step outside of the defaults and your are in for a painful experience.  Want to install to the D drive?  Well, you’d better crack open the command line.  Version upgrades, which they claim are automatic, have been hit and miss.  Sometimes they work.  Sometimes they don’t.  You quickly learn to allocate a few hours to each version upgrade.

So how about the desktop?  Well, it’s a web application too, which is kind of strange.  It works about the same as the server.  Low-overhead coverage, reasonably easy to configure but slow as heck to draw each web page when you have recorded more than a handful of results.  It can manage and use configurations from the server, which is very nice feature.  It can send test results to the Code Central server as well.

The Visual Studio plugin, included with the desktop, has promise.  When it is working, it highlights lines in your code with coverage information.  But it too seems unfinished. It behaves strangely at times.  It causes error dialogs in Visual Studio periodically. It just doesn’t seem quite ready for prime time.

As far as build server integration goes, it is again a story with lots of promise and spotty results.  Because of the way it works, you really don’t integrate it with the build server; Rather, you run the collector at the build server just like you run the collector anywhere else.  We use TeamCity and once we switched to running NUnit instead of TC’s built in test runner, NCover4 was able to capture results.  The latest couple of NCover4 versions periodically hang our build when tests start running.  When this occurs, the NCover4 service cannot be stopped.  Instead, we have to set it to manual start and then reboot the server.  As long as we let one build run without coverage after the boot, we can then startup the NCover4 service and it works again for awhile.

That brings up the subject of support.  It’s friendly and professional but hamstrung by the product.  The bottom line is our typical down time when we have a problem is measured in days rather than hours.  Early on, licensing for the desktops stopped working.  We had to wait more than a week for a release to fix the problem.  At one point, coverage stopped working all together.  We spent hours over several days running utilities to capture information for support to examine only to arrive at the solution of doing a clean install, which, of course, lost all our history in the process.  On average, NCover has been partially or completely down two days out of each of our ten day sprints.

In summary, NCover4 is currently early beta quality — full of bugs and lacking the polish of a finished product.  Given time, it can be very good and worth every penny of its price.  If you have the time to deal with the flaws, it might be worth a try in your project.  However, be prepared to spend significant time and effort to keep it running.