Advanced Distributed System Design with Udi Dahan — Part 2 — Setting the Stage

You can find part 1 of this series here.

I can say for sure that the course has lived up to its reputation so far.  I can also say that Udi is well prepared and practiced.  As expected, he had mastery of the material.  However, unlike some technical presenters, his presentation was very polished.   The material was broken into units that run about an hour and a quarter so we got breaks on a regular basis.  He built a time and place for questions into his presentation and answered them well.  His slides weren’t much to look at, but the information behind them was very good.  He used a whiteboard to illustrate some concepts.  At times, he also asked questions to engage the audience and used their answers to lead into his next point.   The pace was just a little slow for me at times, but that’s because I’ve been practicing some of these architectural ideas myself.  Overall, I would say his pace was a reasonable compromise for the audience.

For the first half of the day, Udi lead us through the back story.  He started with a module that discussed the ten (actually 11 — computers start counting at zero right?) fallacies of distributed computing.  The first hour or so passed quietly for me.  I know the network isn’t reliable, I know latency is a problem, I know bandwidth is a problem etc.  I’m not saying the material was boring.  There was plenty of interesting detail I didn’t know, like exactly how little bandwidth is actually available on gigabit Ethernet, but none of it made me uncomfortable.

However, when he got to the last fallacy — “Business logic can and should be centralized” — I felt it a little in my gut.  He made the point that code reuse was originally based on the false premise that the best way to improve developer productivity is to reduce the number of lines of code a developer writes.  After all, the design is complete and all programmers have to do is type in the code.  Of course, that is false.  Is code reuse more important than performance or scalability?

He wrapped up the first module by stating a theme I suspect he will repeat: “There is no such thing as a best practice”.  No solution is absolute.  If you optimize to reduce latency, you usually end up using more bandwidth.  If you make code reuse your god, you will end up compromising on other aspects of your system.  You have to find a balance for each situation.

The second module, “Coupling in Distributed Systems”, dug deep into the old adage that you want your components to be loosely coupled.  For example, if you have a common logging component that has no callers, it is loosely coupled.  Is that a good thing?  No, it would be unused code.  The truth is different kinds of components should have different kinds of coupling.  He illustrated that brilliantly with an audience-participation exercise that had us voting on generic components with various levels of afferent (inbound) and efferent (outbound) coupling.  He then delved into the the three aspects of coupling — platform, temporal and spatial — and how they each could impact the performance and reliability of your system.    His discussion on how a slow web service under load could end up bringing down a system, or at least making it look unreliable to users, was quite interesting.  He used that example to introduce the concept of reducing temporal coupling using messaging.

The final module of the day, messaging patterns, started to explain the benefits of messaging in some detail.  Although RPC is faster and lighter-weight to call, especially if you ignore potential network issues, messaging is more resilient and scales better.   He spent a good hour going through typical failure scenarios to show how messaging protects you from losing data when a web server crashes or a database sever goes down.

He made the point that transactions can give you a false sense of security.  Imagine a user is placing an order.  The database server has a problem and rolls back the transaction.  Although the database is consistent, you’ve lost an order.  Can you get it back?  Probably not.  He showed us several examples using NServiceBus to illustrate some of his points.  His demonstration of replaying messages from the error queue was especially good.

It looks like we’re going to start digging into SOA on Day 2.  Should be fun.

You can find part 3 of this series here.

%d bloggers like this: