Taller Code

Jul 14

Retrospective tips

Darrell Mozingo |

Uncategorized |

July 14th, 2015|

2 Comments »

My friend Jeremy wrote an excellent post about spicing up retrospectives. I started writing this up as a comment to post there but it got a little long, so thought I’d break it out as a blog post. Jeremy’s experiences mirror mine exactly from running and participating in many retros over the years. Actively making sure they’re not getting routine and becoming an after thought is an absolute must. Here’s a few additional tips we use to run, spice up, and management retros:

Retro bag: We keep a small bag in the office filled with post-its, sharpies, markers, bluetack, etc, to make retro facilitator’s lives easier – they can just grab and go. We also have a print copy of Jeremy’s linked retr-o-mat in it.
Facilitator picker: A small internal app which lets team enter their retro info and randomly select someone to facilitate. It favours those who haven’t done one recently and are available for the needed time span. Sure saves on walking around and asking for a facilitator!
Cross-company retros: We’ve gotten great value out of doing larger cross-company retros after big projects. These are larger (upwards of 20 people) representing as many teams involved as possible (developers, systems, product owners, management, sales, client ops, etc). We used the mail box technique Jeremy mentioned and had attendees generate ideas beforehand to get everything in, limiting the retro to 1.5 hours. Making sure everyone knew the prime directive was also a must, as many hadn’t been involved in retro’s before. Actions that came out ended up being for future similar projects, and were assigned to a team to champion. Sure enough they came in very handy a few months later as we embarked on a similarly large project.
Retro ideas: (don’t remember were I got these, but they’re not original!)
1. Only listing 3 of the good things that happened in a given period. At first I didn’t think focusing purely on the good would result in any actionable outcomes, but the perspective brought about some interesting ideas
2. Making a “treasure map” of the retro time period, with some members adding a “mountain of tech debt”, “bog of infrastructure”, and “sunny beach of automation”. Fun take on the situation to get at new insights
3. Amazon reviews of the period with a star rating and “customer feedback”
4. I’m excited to try out story cubes at the next retro I run – sounds good!

Sep 4

Managing the Unexpected

Darrell Mozingo |

Books |

September 4th, 2013|

No Comments »

I recently read Managing the Unexpected. It’s a brilliant book about running highly resilient organisations. While it’s mostly based on high-risk organisations like nuclear power plants and wild fire firefighting units, it’s still highly applicable to any company just trying to increase their resiliency to failures and outages.

A lot of the points in the book fall into that “that sounds so obvious” category after you read it, but I think those are the best kind as they help clarify information you weren’t able to and give you a good way to communicate them with your colleagues. Still plenty in there to give you something new to think about too. The first half of the book discusses five principals they feel all highly resilient organisations need to follow, while the second half goes over ways to introduce them to your organisation, complete with rating systems for how you function now.

The five main principals the book harps on are (the first three are for avoiding incidents, while the last two are for dealing with them when they occur):

Tracking small failures – don’t let errors slip through the cracks and go unnoticed.
Resisting oversimplification – don’t simply write off errors as “looking like the same one we see all the time”, but investigate them.
Remaining sensitive to operations – employees working on the front line are more likely to notice something out of the ordinary, which could indicate an impending failure. Listen to them.
Maintaining capabilities for resilience – shy away from removing things that’ll keep resilience in your system when there’s an outage.
Taking advantage of shifting locations of expertise – don’t leave all decision making power in the hands of managers that may be separated from the incident. Let front line members call the shots.

Here’s some of my favourite bits of wisdom from the book:

“… try to hold on to those feelings and resist the temptation to gloss over what has just happened and treat it as normal. In that brief interval between surprise and successful normalizing lies one of your few opportunities to discover what you don’t know. This is one of those rare moments when you can significantly improve your understanding. If you wait too long, normalizing will take over, and you’ll be convinced that there is nothing to learn.” (pg 31) There’s been too many times in the past I’ve been involved in system outages where everyone goes into panic mode, gets the problem solves, but then sits around afterwards going “yea, it was just because of that usual x or y issue that we know about”. It’s about digging in and never assuming a failure was because of a known situation (lying to yourself). Dig in and find out what happened with a blank slate after each failure. Keep asking why.
“Before an event occurs, write down what you think will happen. Be specific. Seal the list in an envelope, and set it aside. After the event is over, reread your list and assess where you were right and wrong.” (pg 49) Basically following the scientific method. Setup a null hypothesis with expectations that you can check after an event (software upgrade, new feature, added capacity, etc). It’s definitely not something I’m used to, but trying to build it into my work flow. I love the idea of Etsy’s Catapult tool where they setup expectations for error rates, client retention, etc before releasing a feature, then do A/B testing to show it met or failed each criteria.
“Resilience is a form of control. ‘A system is in control if it is able to minimize or eliminate unwanted variability, either in its own performance, in the environment, or in both… The fundamental characteristic of a resilient organization is that it does not lose control of what it does but is able to continue and rebound.'” (pg 70) – Don’t build highly resilient applications assuming they’ll never break, but instead assume that each and every piece will break or slow down at some point (even multiple together) and design your app to deal with it. We’ve built our streaming platform to assume everything will break, even our dependencies on other internal teams, and we’ll just keep going as best we can when they’re down and bounce back after.
“Every unexpected event has some resemblance to previous events and some novelty relative to previous events. […] The resilient system bears the marks of its dealings with the unexpected not in the form of more elaborate defences but in the form of more elaborate response capabilities.” (pg 72) – When you have an outage and determine the root cause, don’t focus on fixing that one specific error from ever happening again. Instead, try to build resilience into the system to stop that class of problem from having affects in the future. If your cache throwing a specific error was the root cause, for instance, build the system to handle any error from the cache rather than that specific one, and increase metrics around these to respond faster in the future.
“Clarify what constitutes good news. Is no news good news, or is no news bad news? Don’t let this remain a question. Remember, no news can mean either that things are going well or that someone is […] unable to give news, which is bad news. Don’t fiddle with this one. No news is bad news.” (pg 152) – If your alerting system hasn’t made a peep for a few days, it’s probably a bad thing. Some nominal level of errors will always be common, and if you’re hearing nothing it’s an error. Never assuming your monitoring and alerting systems are working smoothly!

Overall the book is an excellent read. A bit dense in writing style at time, but I’d recommend it if you’re working on a complex system that demands uptime in the face of shifting requirements and operating conditions.

Mar 18

DevOps Days London 2013

Darrell Mozingo |

Events |

March 18th, 2013|

No Comments »

I spent this past Friday & Saturday at DevOpsDays London. There’s been a few reviews written already about various bits (and a nice collection of resources by my co-worker Anna), and I wanted to throw my thoughts out there too. The talks each morning were all very good and well presented, but for me the real meat of the event for me was the 3 tracks of Open Spaces each afternoon, along with various break time and hallway discussions. I didn’t take as detailed notes as others did, but here’s the bits I took away from each Open Space:

Monitoring: – Discussed using Zabbix, continuous monitoring, and some companies trying out self-healing techniques with limited success (be careful with services flapping off and on)
Logstash: – Windows client support (not as good as it sounds), architecture (Zeromq everything to one or two servers, then to Elastic search), what to log (everything!)
Configuration Management 101 (w/Puppet & Chef): It was great having the guys from PuppetLabs and Opscode here to give views on both products (and trade some friendly jabs!). Good discussion about Window support, including a daily growing community with package support and the real possibility of actually doing config management on Windows. We’re using CFEngine, and while I got crickets after bringing it up, a few people were able to offer some good advise and compare with Puppet & Chef (stops on error like Chef, good for legacy support, promise support is nice, etc).
Op to dev feedback cycle: Besides the usual “put devs on call” idea (which I still feel is a bad idea), there was discussion about getting bugs like memory leaks prioritised above features. One of the better suggestions to me was simply going and talking to the devs, putting faces to names and getting to know one another. Suggestions were also made for ops to just patch the code themselves, which throws up a lot alarms to me (going through back channels, perhaps not properly tested, etc). I say make a pull request.
Deployment orchestration: Bittorrent for massive deploys (Twitter’s Murder), Jenkins/TeamCity/et al are still best for kicking off deploys, and MCollective for orchestration.
Ops user stories: Creating user stories for op project prioritisation is hard, as is fitting the work in for sprints. Ended up coming down to standard estimation difficulties – more work popping up, unknown unknowns, etc. Left a bit before the end to pop into a Biz & DevOps Open Space, but didn’t get much from it before it ended,

Overall it was a great conference. Well planned, good food, and great discussions. Nothing completely ground breaking, but a lot of really good tips & recommendations to dig into.

Jun 24

Software Craftsmanship 2012

Darrell Mozingo |

Events |

June 24th, 2012|

No Comments »

I attended the Software Craftsmanship 2012 conference last Thursday up at Bletchley Park. It was an awesome event ran mostly by Jason Gorman and the staff at the park. The company I work for, 7digital, sponsored the event so all ticket proceeds went directly to help the park, which is very cool. They’re in desperate need for funding and this event has brought in a hefty amount the past few years.

I did the Pathfinding Peril track in the morning. They went over basic pathfinding algorithms, including brute force and A*, and their applicability outside the gaming world. The rest of the session was spent pairing on bots that compete against other bots trying to automatically navigate a maze the fastest (using this open source tournament server). Unfortunately they didn’t have mono installed, so my pair and I wasted some time getting NetBeans installed and a basic Java app up and running. Very interesting, and it spurred a co-worker to setup a tournament server at work too. Looking forward to submitting a bot there to try out some path finding algorithms.

During our lunch break they gave a nice, albeit quick, tour of the park. We got to see the main sites, including Colossus. Very interesting stuff, and amazing to hear how they pulled off all those decoding and computational feats during the war.

For the afternoon I went to the Team Dojo session. We were told to write our strongest languages on name badges, then break off into teams of 4-6 based on that. I got together with a group of 6 devs, some co-workers. After a brief overview of the Google PageRank algorithm and a generic nearest neighbor one, we were set loose to create a developer-centric LinkedIn clone from a complete standing start. We had to figure out where to host our code, how to integrate, code the algorithms, parse in XML data, and throw it all up on the screen somehow in around 2 hours. Unfortunately we spent way too much time shaving yaks, as it were, with testing and our CI environment, and didn’t get to the algorithms until the end (although we were close to finishing it!). Learned a bit about trying to jump start a project like that with different personalities and making it all mesh together. It’d be interesting to see how we’d all do it again, especially since katas are meant to be repeated.

Between the talks, lunch, hog roast dinner, tour, and the great little side discussions had between it all, it was an excellent event (although they could try doing something about those beer prices!). Everyone did a great job putting it on. Here’s a video of the day Jason put together (I’m one of the last pair of interviews during our afternoon session). I’m quite looking forward to attending it again in the future.

Dec 30

Continuous Delivery

Darrell Mozingo |

Build Management |

December 30th, 2011|

No Comments »

I recently finished reading Continuous Delivery. It’s an excellent book that manages to straddle that “keep it broad to help lots of people yet specific enough to actually give value” line pretty well. It covers testing strategies, process management, deployment strategies, and more.

At my former job we had a PowerShell script that would handle our deployment and related tasks. Each type of build – commit, nightly, push, etc. – all worked off its own artifacts that it created right then, duplicating any compilation, testing, or pre-compiling tasks. That eats up a lot of time. Here’s a list of posts where I covered how that script generally works:

Part 1
Part 2
Part 3
Part 4

The book talks about creating a single set of artifacts from the first commit build, and passing those same artifacts through the pipeline of UI tests, acceptance tests, manual testing, and finally deployment. I really like that idea, as it cuts down on unnecessary rework, and gives you more confidence that this one set of artifacts are truly ready to go live. Sure, the tasks could call the same function to compile the source or run unit tests, so it was effectively the same, but there could have been slight differences where the assemblies produced from the commit build were slightly different than those in the push build.

I also like how they mention getting automation in your project from day one if you’re lucky enough to work on a green-field app. I’ve worked on production deployment scripts for legacy apps and for ones that weren’t production yet, but still a year or so old. The newer an app is and the less baggage it has, the easier it is to get started, and getting started is the hardest part. Once you have a script that just compiles and copies files, you’re 90% of the way there. You can tweak things and add rollback functionality later, but the meat of what’s needed is there.

However you slice it, you have to automate your deployments. If you’re still copying files out by hand, you’re flat out doing it wrong. In the age of PowerShell, there’s really no excuse to not automate your line of business app deployment. The faster deliveries, more transparency, and increased confidence that automation gives you can only lead to one place: the pit of success, and that’s a good place to be.

Nov 14

Moving on

Darrell Mozingo |

Misc. |

November 14th, 2011|

No Comments »

I’ve been at Synergy Data Systems for over 7 years now (I know, the site is horrible). I’ve worked with a lot of great people on some very interesting projects, and learned a boat load during that time. Unfortunately, they can’t offer the one thing my wife and I wanted: living abroad.

To that end, we’re moving to London and I’ll be starting at 7digital in early January. I’m super excited about both moves. 7digital seems like a great company working with a lot of principals and practices that are near and dear to me, and c’mon, it’s London. For two people that grew up in small town Ohio, this’ll be quite the adventure!

I’m looking forward to getting involved in the huge developer community over there, playing with new technologies, and working with fellow craftsmen!

Sep 29

Painfully slow clone speeds with msysgit & GitExtensions

Darrell Mozingo |

Quickie |

September 29th, 2011|

7 Comments »

UPDATE: See Paul’s comment below – sounds like the latest cygwin upgrade process isn’t as easy as it used to be.

If you install GitExtensions, up through the current 2.24 version (which comes bundled with the latest msysgit version 1.7.6-preview20110708), and use OpenSSH for your authentication (as opposed to Plink), you’ll likely notice some painfully slow cloning speeds. Like 1MB/sec on a 100Mb network kinda slow.

Thankfully, it’s a pretty easy fix. Apparently msysgit still comes bundled with an ancient version of OpenSSH:

$ ssh -V
OpenSSH_4.6p1, OpenSSL 0.9.8e 23 Feb 2007

Until they get it updated, it’s easy to do yourself. Simply install the latest version of Cygwin, and make sure to search for and install OpenSSH on the package screen. Then go into the /bin directory of where you installed Cygwin, and copy the following files into C:\Program Files\Git\bin (or Program Files (x86) if you’re on 64-bit):

cygcrypto-0.9.8.dll
cyggcc_s-1.dll
cygssp-0.dll
cygwin1.dll
cygz.dll
ssh.exe
ssh-add.exe
ssh-agent.exe
ssh-keygen.exe
ssh-keyscan.exe

Checking the OpenSSH version should yield something a bit higher now:

$ ssh -V
OpenSSH_5.8p1, OpenSSL 0.9.8r 8 Feb 2011

Your clone speeds should be faster too. This upgrade bumped ours from literally around 1MB/sec to a bit over 10MB/sec. Nice.

Sep 15

Getting started with TDD

Darrell Mozingo |

Musings, Testing |

September 15th, 2011|

No Comments »

When I first read about TDD and saw all the super simple examples that litter the inter-tubes, like the calculator that does nothing but add and subtract, I thought the whole thing was pretty stupid and its approach to development was too naive. Thankfully I didn’t write the practice off – I started trying it, plugging away here and there. One thing I eventually figured out was that TDD is a lot like math. You start out easy (addition/subtraction), and continue building on those fundamentals as you get used to it.

So my suggestion to those starting down the TDD path is: don’t brush it off. Start simple. Do the simple calculator, the stack, or the bowling game. Don’t start thinking about how to mix in databases, UI’s, web servers, and all that other crud with the tests. Yes, these examples are easy, and yes they ignore a lot of stuff you need to use in your daily job, but that’s sort of the point. They’ll seem weird and contrived at first, but that’s OK. It serves a very real purpose. TDD has been around for a good while now, it’s not some fad that’s going away. People use it and get real value out of it.

The basic practice examples getting you used to the TDD flow – red, green, refactor. That’s the whole point of things like kata’s. Convert that flow into muscle memory. Get it ingrained in your brain, so when you start learning the more advanced practices (DIP, IoC containers, mocking, etc), you’ll just be building on that same basic flow. Write a failing test, make it pass, clean up. You don’t want to abandon that once you start learning more and going faster.

It seems everyone gets the red-green-refactor part down when they’re doing the simple examples, but forget it once they start working on production code. Sure, you don’t always know what your code is going to do or look like, but that’s why we have the tests. If you can’t even begin to imagine how your tests will work, write some throw away spike code. Get it working functionally, then delete it all and start again using TDD. You’ll be surprised how it changes.

Good luck with your journey. If you’re in the Canton area, don’t forget to check out the monthly Canton Software Craftsmanship meetup. There are experienced people there that are eager to help you out.

Jul 28

Commenting out old code kills puppies

Darrell Mozingo |

Musings |

July 28th, 2011|

1 Comment »

There, I said it. Actually, I’m kind of worried that title won’t adequately state the intensity of this situation.

This is one of the fundamental reasons we have source control people, so we can go back through a file’s history and see the different revisions. Please, for the love of all that is holy, don’t comment out old code. Just delete it! Feel free to slap your own knuckles with a ruler if you start to think about commenting it. Don’t try to recreate a source control system through commented out code. Everyone knows exactly what I’m talking about:

// John Doe - 7/5/2011 - Changed to allow a higher limit.
// dozens of lines of old code....
 
// John Doe - 7/18/2011 - Changed algorithm slightly.
// dozens of lines of old code....
 
// random dozen lines of old code with no comment at all
 
public void ActualCode() { }

Those extra comment chunks are just crap to sift through to get to the real code, extra stuff you’ll have to parse to see if it’s relevant to the current situation, and creating more false-positives for ReSharper (and I’m guessing other refactoring tools) to pick up when you rename a variable/method that’s used inside those commented chunks. That chunk of old code at the bottom without even a hint as to why it’s commented out? That’s the worst of the worst – someone’s going to sit there and stare at it for a good while before they figure out why it was commented out, and we know when the author actually committed this file with that commented out the commit comment was blank too. Awesome.

So anyway, just remember what actually happens the next time you’re about to comment out old code and don’t do it, you’ll be doing future programers (and more than likely yourself) a huge service…

Jul 20

Consistent modal dialogs, the easy way

Darrell Mozingo |

Web |

July 20th, 2011|

No Comments »

So we all know the default alert dialog box visually sucks. Any of the hundreds of jQuery modal plugins work wonderfully for replacing it with something a bit snazzier (although putting the information on the page for a user is even better, but that’s for another post). The biggest problem with most of those dialogs are either the setup cost, or the memory cost:

Setup cost: having to set heights, widths, button names, text & title fields, yada, yada, yada. A lot of that can be skinned through CSS, and a lot of plugins reduce that noise to virtually nil, but many leave a lot on your pages. It’s ugly to look at in your code, and ugly to configure. Not to mention all those config settings spreads through your code base like the freakin’ ground ivy is spreading through my lawn as I type this. Want to change the widths for a new redesign, or localize the button names? Good luck!
Memory cost: relates to the Pit of Success. Do you really want the burden of always remembering to use that modal dialog instead of alert? What about the new guy, is he going to know or remember? Sure, forgetting isn’t that big of a deal, but given enough slip ups and your nice consistent UI goes to hell. Tests checking for calls to alert are also possible via straight searching through files or through UI tests some how, but I can see a future of false positives ahead of that idea.

How about a better way? With some very slight Javascript-foo, you can override the default alert and confirm dialogs so not only is there nothing to copy & paste between pages, but you don’t even have to remember to use your nifty modal boxes – it’ll just happen. We’ll use the jQuery UI Dialog plugin inside a stock ASP.NET MVC app, though this is easily transferable to any other platform or with any other modal plugin.

First we’ll override the default alert method on the window object, calling the dialog function from jQuery UI and setting some default parameters:

window.alert = function (message) {
	$(