Advanced Distributed System Design with Udi Dahan — Part 2 — Setting the Stage

You can find part 1 of this series here.

I can say for sure that the course has lived up to its reputation so far.  I can also say that Udi is well prepared and practiced.  As expected, he had mastery of the material.  However, unlike some technical presenters, his presentation was very polished.   The material was broken into units that run about an hour and a quarter so we got breaks on a regular basis.  He built a time and place for questions into his presentation and answered them well.  His slides weren’t much to look at, but the information behind them was very good.  He used a whiteboard to illustrate some concepts.  At times, he also asked questions to engage the audience and used their answers to lead into his next point.   The pace was just a little slow for me at times, but that’s because I’ve been practicing some of these architectural ideas myself.  Overall, I would say his pace was a reasonable compromise for the audience.

For the first half of the day, Udi lead us through the back story.  He started with a module that discussed the ten (actually 11 — computers start counting at zero right?) fallacies of distributed computing.  The first hour or so passed quietly for me.  I know the network isn’t reliable, I know latency is a problem, I know bandwidth is a problem etc.  I’m not saying the material was boring.  There was plenty of interesting detail I didn’t know, like exactly how little bandwidth is actually available on gigabit Ethernet, but none of it made me uncomfortable.

However, when he got to the last fallacy — “Business logic can and should be centralized” — I felt it a little in my gut.  He made the point that code reuse was originally based on the false premise that the best way to improve developer productivity is to reduce the number of lines of code a developer writes.  After all, the design is complete and all programmers have to do is type in the code.  Of course, that is false.  Is code reuse more important than performance or scalability?

He wrapped up the first module by stating a theme I suspect he will repeat: “There is no such thing as a best practice”.  No solution is absolute.  If you optimize to reduce latency, you usually end up using more bandwidth.  If you make code reuse your god, you will end up compromising on other aspects of your system.  You have to find a balance for each situation.

The second module, “Coupling in Distributed Systems”, dug deep into the old adage that you want your components to be loosely coupled.  For example, if you have a common logging component that has no callers, it is loosely coupled.  Is that a good thing?  No, it would be unused code.  The truth is different kinds of components should have different kinds of coupling.  He illustrated that brilliantly with an audience-participation exercise that had us voting on generic components with various levels of afferent (inbound) and efferent (outbound) coupling.  He then delved into the the three aspects of coupling — platform, temporal and spatial — and how they each could impact the performance and reliability of your system.    His discussion on how a slow web service under load could end up bringing down a system, or at least making it look unreliable to users, was quite interesting.  He used that example to introduce the concept of reducing temporal coupling using messaging.

The final module of the day, messaging patterns, started to explain the benefits of messaging in some detail.  Although RPC is faster and lighter-weight to call, especially if you ignore potential network issues, messaging is more resilient and scales better.   He spent a good hour going through typical failure scenarios to show how messaging protects you from losing data when a web server crashes or a database sever goes down.

He made the point that transactions can give you a false sense of security.  Imagine a user is placing an order.  The database server has a problem and rolls back the transaction.  Although the database is consistent, you’ve lost an order.  Can you get it back?  Probably not.  He showed us several examples using NServiceBus to illustrate some of his points.  His demonstration of replaying messages from the error queue was especially good.

It looks like we’re going to start digging into SOA on Day 2.  Should be fun.

You can find part 3 of this series here.

Advanced Distributed System Design with Udi Dahan — Part 1 — Before the Brain Storm

I head to Austin Texas today to attend Udi Dahan’s 5-day course titled “Advanced Distributed System Design”.  Assuming I have any brain power left, I intend to blog my impressions at the end of each day.  I though it might be fun to start off the series with a brief post about my expectations.

This course is doubly interesting to me because I also start a new job on Monday at a growing e-commerce company.  I got involved with them as a consultant helping them vet aspects of their new application design.  It was largely based on SharpArchitecture and followed a fairly typical layered approach.  I strongly advocated a more distributed approach perhaps even going as far as using CQRS.  In the end, the team decided to adopt some distributed concepts in the higher-traffic, customer-facing application.   The administrative application will continue to rely on SQL Server and transactions to guarantee data consistency.

Unfortunately, it is becoming clear that the administrative application will not be able to use transactions across the board as the team hoped.  It has to handle things, like media, that don’t fit well in SQL Server and probably won’t be able to participate in transactions.  This is forcing the team to wrestle with some of the concepts I felt justified a distributed approach for the administrative application too.

So now I am off to see the wizard along with some of my new collegues.  Everything I’ve read and heard tells me that I am in for a mind-blowing experience.  Although I have been working with distributed systems for a very long time, I fully expect to feel overwhelmed and probably pretty stupid at times.  My main goal is to exit the class with a broader perspective on distributed system design.  My secondary goal is to learn how to better explain the benefits and costs of building a distributed system to others.

I also expect to see some benefits from a team perspective.  I can just imagine the dinner conversations with the team as we wrestle with the implications of what we learned during the day.  My hope is that the team will leave with confidence about where and how to apply distributed system concepts to solve some of the problems that we currently face.  Construction kicks off March 1 so there is still some time to adjust course if we feel it is necessary.

Go on to part 2.

Cross-Domain Post With Silverlight or Why I Hate Hackers

All I wanted was a nifty Silverlight demo of my company’s new .NET package rating API for the commercial print industry.  The main goal of the demo was to provide sample code that developers could look at to understand how easy it is to use.  To keep the sample code simple, I wanted the Silverlight application to call the API directly.  Should have been easy, right?  Well, it wasn’t mainly because the API does an HTTP post to a site in a different domain and Silverlight just doesn’t allow that.

I’m not going to regurgitate the various ways you can get around this limitation.  Just google search on “cross domain post silverlight” if you want technical details.  I settled on a proxy approach, which I implemented grumbling all the way.

Grumbling because it only has to be this annoying because hackers have developed a number of malicious ways to use cross-domain posts to do nasty things.  Again, I’ll spare you the details but you can follow the link if you really want them.  Suffice it to say that because some people find it amusing, profitable or fulfilling to lie, cheat and steal, you have to jump through hoops to post something to an outside domain from Silverlight.  It’s really the same reason I have to waste $20 per month on alarm monitoring for my home.  If a very few people did not suck, we would not have to lock our doors, we would not have to install annoying virus checking software on our computers and we could do cross-domain posts from Silverlight without requiring the domain we are calling to have some magic file that we require.

End rant.

Review of “Coders at Work” by Peter Seibel

As young as programming is, it seems that most of the practitioners in the field have little sense of the history of it.  I’m not entirely sure why this is.  Perhaps it is caused by the twin realities of rapidly evolving technologies and the relative youth of the folks that write code.  No matter the reasons, it seems like nobody thinks very much about what happened in languages like Assembly, Lisp, Fortran and Smalltalk on platforms like early IBM mainframes, Altos and PDP-11s back in the dark ages before the dawn of the Internet age.  It seems to me that it pays to understand where your field has come from.   If you feel as I do about this, you should take the time to read “Coders at Work”, a collection of interviews  with some of the all-time greatest programmers that most of us have never heard of.

Well, hopefully you’ve heard of some of them.  Guys like Ken Thompson (co-creator of UNIX) and Donald Knuth (author of the masterwork “The Art of Computer Programming”) have set down bodies of work that demand attention.  Some of them, like Brendan Eich (CTO at the Mozilla Foundation) and Joshua Bloch (Chief Java Architect at Google), might be well-known because of their current positions more than their past work.  Others, like Fran Allen (2002 Turing Award winner), are probably unknown to 99% of the programmers on the planet.  No matter their fame, every single one of them has fascinating things to say about where we’ve been and where we’re going as a profession.

I think what struck me the most as I read through the first-hand accounts was just how little computer these guys had to work with back in the early days.  I thought I had it rough at the start of my career writing systems using HP3000 Basic and its two-character limit on variable names until I read about programming analog computers in the 50s and the early digital computers that supported as little as 200 words of memory.   It makes you wonder what people will think of our mildly multi-core servers thirty years from now.  It is also amazing how programming has remained just about as hard as it was back then.  Sure, we have better tools now, but our users expect much more too.

Although this book does not offer the sweeping, dispassionate view-point of a true history, it provides the invaluable personal perspective of the people who made history happen.  Reading this book is about the closest many of us will ever get to joggling punch cards and toggling switches to enter code.  It’s definitely worthwhile reading for every professional coder.

Check out “Coders at Work” at Amazon.

Check Your Assumptions at the Door

Every year around this time I start going to McDonalds almost every day for lunch for one reason and one reason only: Monopoly.  I’ve faithfully stuck game pieces to my little board every year in hopes of winning some valuable prize.  I’m not greedy.  I never dream about winning the big one.  No, anything worth more than about $2 would be just fine with me.  As you can guess, I’ve won plenty of fries and small cokes over the years but never the elusive $2+.

Anyway, these last couple of years they’ve gone to a web-based component located at www.playatmcd.com.  It’s a rather pretty, Flash-based thing that asks you to put in annoyingly long codes from each of your stamps.  Once it’s validated the code it lets you roll the dice and moves your piece around the board.  All in all, it provides a perfectly satisfying McDonalds Monopoly experience.  One particular feature of the game is that when you land on community chest or chance you get Coke rewards.  When you land on the winning square, a message pops-up over the game board to inform you of your good fortune and a follow-up email shows up in your inbox with the info on how to logon to the Coke rewards site to claim your points.  I never gave the feature much thought.  I’d land on the square, I’d get the popup and a few minutes later the email would show up.   It all seemed to work perfectly well until yesterday when I moved my email from Go Daddy to GMail.

It turns out the system isn’t very smart about the possibility of fast email.  First, I rolled an 11 and while my piece was still moving GMail alerted me about a new message from McDonalds about a Coke reward.  A little odd but not too bad.  Clearly, the server sent the email at the same time it told the Flash client where to land my piece.  Not perfect but probably unavoidable.  The second thing that happened really bugged me.  I rolled double threes and landed on a property.  Just after landing, I got an email about another Coke reward, which did not make any sense since the game had told me nothing about winning another Coke reward.  My second and last roll (no doubles this time) landed me on community chest, where I finally won the Coke reward I had received the email about a couple of minutes before.

So what does this tell me?  Well, when the site receives the code from my game piece it must calculate the square where I will land.  It might take one roll or it might take a couple.  Regardless of how many moves it will take, it sends the prize email at the same time it tells the Flash client what to do.  The Flash client adds some nice animation of rolling dice and shows the piece moving around the board.  While that is going on, the email is making it’s way to my inbox, which is now so fast that I see the email before the piece stops moving.  I never noticed this flaw in the logic before because my old email didn’t show up nearly fast enough to expose it.  My guess is the developers of the site either missed this or figured nobody would notice.  Once you see this happen the illusion that pressing the button to roll the dice means something is shattered and the game is no longer much fun.   It makes me wonder why they bothered to implement the animated game board at all.  After all, they have a facility that allows you to enter a code and see what you get without the animation.

At the end it all comes down to where bad assumptions can lead.  The developers assumed I would not notice the pointlessness of rolling the dice.  Given my reaction to what I saw, I guess somewhere in the primitive part of my brain I actually though the dice roll mattered.

I leave you with my favorite version of the old saying about assumptions.