kitchen table math, the sequel: Googlemaster on learning from disaster vs success

Wednesday, July 21, 2010

Googlemaster on learning from disaster vs success

People don't usually analyze the reasons for success, but they do analyze the reasons for failure. The company I work for has a thing we do called "root cause analysis" and "the five whys" where we dig and dig and dig to try to find the real reason for failure. We don't usually do this for successes. Maybe we should.

- The customer reported a primary key violation on table FOO.
- Why is the customer getting a PK violation on FOO? Because the software is trying to insert two records for the same thing.
- Why is the software trying to insert duplicate records? Because two threads are trying to do the same work.
- Why are two threads trying to do the same work? Because there's a bug that wasn't caught.
- Why wasn't the bug caught before release? Because we didn't test the multi-thread scenario.
- Why didn't we test the multi-thread scenario? We were going to test it, but we ran out of time. (Or, you could branch off in a different direction here and go with "Because we didn't know the software needed to support multiple threads of execution.")
- Why did we run out of time? Because "just a small feature request" was added to the schedule after we did the planning and estimation.
- And so on...

3 comments:

Joshua Fisher said...

I have had an ongoing discussion related to this ongoing discussion. It has to do with a friend of mine trying to coax me back into playing tournament chess:

"What, are you afraid to lose? You know, you learn by losing."

And actually, I don't think that's true at all. You don't learn by losing; you learn by trying to win.

There's a big difference.

ChrisA said...

And then there's not testing the antenna on the outside of the case. Of course, almost any electrical engineer worth 10 cents would at least suspect that it's a very bad idea.

Featuritus and innovation by decree. Gotta love it.

Anonymous said...

"And then there's not testing the antenna on the outside of the case. Of course, almost any electrical engineer worth 10 cents would at least suspect that it's a very bad idea."

True. Of course, a lot of engineers are not happy with many/most of Apple's products (the OS is fairly slow at what OSes are supposed to do, and it uses a fair amount of memory. iPods often lose out to competing products on capacity/features at a given cost. Etc.). Still, Apple sells a lot of products and makes a lot of money doing so, which suggest that this sort of thing might be more of a business tradeoff than a clearly "bad idea." One can make a very good argument that the iPhone4 reception problem is small enough (or effects few enough people) that Apple will still do better with it than if Apple had released a more engineeringly sound product.

Or maybe not. We'll know in six months or so.

-Mark Roulo