<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>CyberFOX Software Inc. &#187; work</title>
	<atom:link href="http://cyberfox.com/blog/category/work/feed" rel="self" type="application/rss+xml" />
	<link>http://cyberfox.com/blog</link>
	<description>Coding, Connections, and Other Bloggy Bits of Goodness</description>
	<lastBuildDate>Sat, 04 Feb 2012 10:21:21 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
		<item>
		<title>Approaching an inherited Rails codebase</title>
		<link>http://cyberfox.com/blog/56-approaching-an-inherited-rails-codebase</link>
		<comments>http://cyberfox.com/blog/56-approaching-an-inherited-rails-codebase#comments</comments>
		<pubDate>Fri, 22 May 2009 06:20:29 +0000</pubDate>
		<dc:creator>Cyberfox</dc:creator>
				<category><![CDATA[coding]]></category>
		<category><![CDATA[Ruby]]></category>
		<category><![CDATA[startup]]></category>
		<category><![CDATA[testing]]></category>
		<category><![CDATA[work]]></category>
		<category><![CDATA[maintenance]]></category>
		<category><![CDATA[rails]]></category>

		<guid isPermaLink="false">http://www.cyberfox.com/blog/?p=56</guid>
		<description><![CDATA[Greetings, [Edit: Since writing this article up back in early March, I've moved on from this job. The folks who are now maintaining it at least know where the pain points are, can run migrations safely, deploy it locally, and to dev servers, and to the main deployment area.  It's a working app, although I [...]]]></description>
			<content:encoded><![CDATA[<p>Greetings,</p>
<p><small><em>[Edit: Since writing this article up back in early March, I've moved on from this job. The folks who are now maintaining it at least know where the pain points are, can run migrations safely, deploy it locally, and to dev servers, and to the main deployment area.  It's a working app, although I never got code coverage above about 45% at least the coverage was decent in the core app areas by the time I left.]</em></small></p>
<p><a title="A Fresh Cup" href="http://afreshcup.com/">Mike Gunderloy</a> had an interesting article entitled &#8216;<a href="http://afreshcup.com/2009/03/23/batting-clean-up/">Batting Clean-up</a>&#8216;, which was very timely for me.  I’ve just started maintaining and trying to improve a Rails app developed by an ‘outsourced’ group. The only tests were the ones generated automatically by ‘restful authentication’, and they were never maintained, so they didn’t come close to passing. Swaths of the program are written in terribly complex (and sometimes computed) SQL, migrations didn’t bring up a fresh database (poor use of acts_as_enumerated causes great hurt), and vendor/plugins should have just had one named ‘kitchen_sink’.</p>
<p>It hurts to see Rails abused like that; you want to take the poor application under your arm and say, ‘It’ll be okay…we’ll add some tests and get you right as rain in no time!’, but you know you’d be lying…</p>
<p>I did much of what Mike <a title="Batting Clean-Up" href="http://afreshcup.com/2009/03/23/batting-clean-up/">described</a> (half the gems it used were config.gem’ed, the other half weren’t), vendor’ed rails (it breaks on newer than 2.1.0), and brought the development database kicking and screaming into life. There was no schema.rb, it had been .gitignore’d, and the migrations added data, used models, and everything else you can imagine doing wrong. (Including using a field on a model after adding that column in the previous line…I don’t know what version of Rails that ever worked on…) I didn’t want a production database; who knows what’s been done to that by hand. I want to know what the database is _supposed_ to look like; I can figure out the difference with production later.</p>
<p>Once the clean (only data inserted by migrations) dev database was up, I brought the site up to see if it worked. Surprisingly enough, it did; apparently they used manual QA as their only testing methodology. I appreciate their QA a lot; it means it’s a working application, even if it’s not going to help me refactor it.</p>
<p>I ran <a title="Flog (grotesque art, great tool)" href="http://ruby.sadi.st/Flog.html">flog</a> and <a href="http://ruby.sadi.st/Flay.html">flay</a> and looked at the pain points they found to get an idea how bad things might be. I picked an innocuous join table (with some extra data and functionality) to build the first set of tests for, which gave me insight into both sides of the join without having to REALLY dig into the ball of fur on either side. I viciously stripped all the ‘test_truth’ tests. I looked for large files that flog and flay hadn’t picked up to pore over. Check out custom rake tasks, because those often are clear stories and easy to quickly understand in a small context.</p>
<p>Checking out the deployment process tells you a lot also, although it turns out this was stock engine yard capistrano.</p>
<p>Skimming views (sort by size!) will tell you a lot also, especially when you find SQL queries being run in them…</p>
<p>Use the site for a little while, and watch the log in another window. Just let it skim by; if you’ve looked at log files much, things that seem wrong will jump out even if it’s going faster than you can really read.</p>
<p>In my case, the code’s mine now, so it’s my responsibility to make it better before anybody else has to touch it. I’ve got about a week of ‘free fix-it-up time’ before I need to start actually implementing new features and (thankfully) stripping out old ones… At my previous company, I was the guy pushing folks to test, now I’ve inherited a codebase with zero tests. Poetic justice, I suppose&#8230; <img src='http://cyberfox.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>Good luck!</p>
<p>—  Morgan Schweers, Cyber<strong>FOX</strong>!</p>
]]></content:encoded>
			<wfw:commentRss>http://cyberfox.com/blog/56-approaching-an-inherited-rails-codebase/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>To fire, or not to fire, &#8216;workaholics&#8217;&#8230;</title>
		<link>http://cyberfox.com/blog/37-to-fire-or-not-to-fire-workaholics</link>
		<comments>http://cyberfox.com/blog/37-to-fire-or-not-to-fire-workaholics#comments</comments>
		<pubDate>Sat, 08 Mar 2008 01:07:15 +0000</pubDate>
		<dc:creator>Cyberfox</dc:creator>
				<category><![CDATA[business]]></category>
		<category><![CDATA[coding]]></category>
		<category><![CDATA[McAfee Associates]]></category>
		<category><![CDATA[passion]]></category>
		<category><![CDATA[PayPal]]></category>
		<category><![CDATA[startup]]></category>
		<category><![CDATA[work]]></category>

		<guid isPermaLink="false">http://www.vixen.com/blog/2008/03/07/37</guid>
		<description><![CDATA[Greetings, There&#8217;s an interesting few blog posts going on about folks who work really hard. It started from Jason Calacanis&#8217;s article of tips on how to save money when running a startup (many of which are good, but #11 is &#8216;Fire people who are not workaholics&#8230;&#8217;) and that was picked up at the 37signals SvN [...]]]></description>
			<content:encoded><![CDATA[<p>Greetings,<br />
There&#8217;s an interesting few blog posts going on about folks who work really hard.  It started from Jason Calacanis&#8217;s article of <a title="How to save money running a startup" href="http://www.calacanis.com/2008/03/07/how-to-save-money-running-a-startup-17-really-good-tips/">tips on how to save money when running a startup</a> (many of which are good, but #11 is &#8216;Fire people who are not workaholics&#8230;&#8217;) and that was picked up at the <a title="Signal vs. Noise" href="http://www.37signals.com/svn/">37signals SvN blog</a> which <a title="Fire the workaholics" href="http://www.37signals.com/svn/posts/902-fire-the-workaholics">comes out strongly against workaholics</a>.</p>
<p>As with everything else, it&#8217;s not that simple&#8230;</p>
<p>In the successful startups I&#8217;ve worked at, a core of people staying late, working long hours, was a symptom of having an idea that people can believe in.</p>
<p>I have not seen any very successful startups where the developers weren&#8217;t at least a little monomaniacal about their work.</p>
<p>On the contrary, I&#8217;ve been at two successful startups (defined here as wildly successful IPO&#8217;s) where having those fanatic developers was a core reason of why they were successful.</p>
<p>The people who were putting in overwhelming hours at those companies weren&#8217;t doing it because they&#8217;re workaholics.  They were doing it because they were true believers.  Both in the company itself and the product they were building.</p>
<p>It&#8217;s not about the workaholics making the company successful, it&#8217;s about the company being one that the employees can believe in, to the point of _wanting_ to be there, wanting to be making it better.</p>
<p>In those cases, you don&#8217;t fire the people who are passionate about building your company.  You support them, and accept that they&#8217;re going to crash occasionally, and try to nerf the crash some&#8230;</p>
<p>In my experience, it&#8217;s the fervent employees who are the core of successful startups.  This was true at McAfee Associates (went public in 1992), and PayPal (went public in 2002), both successful startups that I was part of.</p>
<p>You also need people who aren&#8217;t as fervent, who can see a wider view, so it&#8217;s always a balance.  So you can&#8217;t really &#8216;fire&#8217; either of them, out of hand.</p>
<p>I&#8217;ve been that true believer, focusing everything into a job or project that I deeply care about.  I&#8217;m a much calmer, more balanced person now, though.  We&#8217;ll see what happens in 2012&#8230;  <img src='http://cyberfox.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>&#8211;  Morgan</p>
]]></content:encoded>
			<wfw:commentRss>http://cyberfox.com/blog/37-to-fire-or-not-to-fire-workaholics/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>TDD: The &#8216;Logans Run&#8217; of Software Development&#8230;</title>
		<link>http://cyberfox.com/blog/35-tdd-the-logans-run-of-software-development</link>
		<comments>http://cyberfox.com/blog/35-tdd-the-logans-run-of-software-development#comments</comments>
		<pubDate>Sat, 06 Oct 2007 03:09:23 +0000</pubDate>
		<dc:creator>Cyberfox</dc:creator>
				<category><![CDATA[BDD]]></category>
		<category><![CDATA[coding]]></category>
		<category><![CDATA[TDD]]></category>
		<category><![CDATA[testing]]></category>
		<category><![CDATA[work]]></category>

		<guid isPermaLink="false">http://www.vixen.com/blog/2007/10/05/35</guid>
		<description><![CDATA[Greetings, I want to start by making it clear that I know why testing is good, and that it&#8217;s really important, but I think that the TDD proponents are glossing over the most difficult part of a project. I would very much like someone to address the issue of modifying code that is not new, [...]]]></description>
			<content:encoded><![CDATA[<p>Greetings,</p>
<p>I want to start by making it clear that I know <strong>why</strong> testing is good, and that it&#8217;s <em>really</em> important, but I think that the TDD proponents are glossing over the most difficult part of a project.</p>
<p>I would very much like someone to address the issue of modifying code that is <strong>not new</strong>, and <strong>not already perfectly tested</strong> (or even completely specified!).  That is to say, the vast majority of actual code out there.</p>
<p>TDD <em>is</em> intensely focused on the early development phase (or at least TDD proponents are), and on writing <em>new</em> code, as opposed to what the majority of software developers actually do; maintain and update existing code.</p>
<p>It&#8217;s really straightforward (and fun!) to write entirely new code in the TDD fashion.  I&#8217;ve done it for about 3 decent sized projects now (one Java and two Rails), and it can be really pleasant, and a great focusing tool.  No arguments there; when you do it from the start, it&#8217;s really wonderful.</p>
<p>On the other hand, when you&#8217;re making incremental changes here and there throughout a very large, pre-existing, only partially tested codebase, it&#8217;s vastly less pleasant to try and do it test-first.</p>
<p>The Ruby Autotest tool is not such a pleasant tool at that point.  You stop wanting to write failing tests, because fixing it means autotest is going to try to do a <em>full retest</em>, which <strong>sucks</strong> for developer flow&#8230;  Even if your tests take &#8216;only&#8217; 5 minutes to run, breaking a test makes you wince, and writing a failing test and then fixing it is for masochists only (and ones who want to miss the project milestones at that).</p>
<p>The focus of <em>every</em> presentation (<a title="How I learned to love testing..." href="http://www.railsenvy.com/2007/10/4/how-i-learned-to-love-testing-presentation">this one</a> included) I&#8217;ve seen on TDD being on the <em>start</em> of a project makes me wonder why nobody&#8217;s talking about later in projects&#8230;</p>
<p>Nearly every developer out there is going to face a large codebase with poor testing coverage, and will have to make changes that aren&#8217;t entirely new code, to existing code that isn&#8217;t entirely tested.  Does TDD have a solution for the &#8216;large, crufty codebase&#8217;, or is it suited only for 1.0 versions, small projects, and projects that were TDD from the start?</p>
<p>This isn&#8217;t really a rhetorical question for me.  I <em>really</em> want to get my organization&#8217;s culture more oriented towards testing.  I&#8217;ve got buy-in from lots of people that when they&#8217;re writing new modules and services, they&#8217;ll do it test-first (or at least &#8216;test around the same time as the code&#8217;, which is all I can ask for at this point), and that&#8217;s great.  But has anybody developed <strong>any</strong> tools to make TDD better suited to <em>maintenance</em> and improving existing code?</p>
<p>&#8211;  Morgan</p>
<p>p.s.  I&#8217;m skipping BDD entirely, because BDD is so hardcore in the &#8216;only for already very well specified solutions&#8217; camp, that it&#8217;s meaningless for this question.  I&#8217;m also using &#8216;TDD&#8217; and &#8216;test-first&#8217; interchangeably, and I probably shouldn&#8217;t be.</p>
<p>p.p.s.  The title refers to how in Logans Run, everybody was destroyed at 30, so there weren&#8217;t any old people. In the world of TDD (or at least TDD presentations), there are no old projects, every one is fresh and new, so the issues that come with an old code base are never addressed.</p>
]]></content:encoded>
			<wfw:commentRss>http://cyberfox.com/blog/35-tdd-the-logans-run-of-software-development/feed</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>Work/Life Balance versus The Passion of the Code.</title>
		<link>http://cyberfox.com/blog/11-worklife-balance-versus-the-passion-of-the-code</link>
		<comments>http://cyberfox.com/blog/11-worklife-balance-versus-the-passion-of-the-code#comments</comments>
		<pubDate>Fri, 23 Dec 2005 20:33:00 +0000</pubDate>
		<dc:creator>Cyberfox</dc:creator>
				<category><![CDATA[coding]]></category>
		<category><![CDATA[microsoft]]></category>
		<category><![CDATA[passion]]></category>
		<category><![CDATA[PayPal]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[work]]></category>

		<guid isPermaLink="false">http://www.vixen.com/blog/2005/12/23/11</guid>
		<description><![CDATA[Greetings, Robert Scoble takes Mark Lucovsky to task over seeing passion in Google workers sticking around until all hours of the night. This is a hard thing to explain if you haven&#8217;t been there. I&#8217;ve been there twice, once with McAfee Associates, in full-bore, turbo-charged engineer mode, fighting against the world-wide virus writing epidemic in [...]]]></description>
			<content:encoded><![CDATA[<p>Greetings,<br />
<a href="http://scobleizer.wordpress.com/2005/12/23/markl-loves-4-am-workers">Robert Scoble takes Mark Lucovsky to task</a> over seeing passion in Google workers <a href="http://minimsft.blogspot.com/2005/12/comment-report-markl-nee-of-microsoft.html">sticking around until all hours of the night</a>.</p>
<p>This is a hard thing to explain if you haven&#8217;t been there.  I&#8217;ve been there twice, once with McAfee Associates, in full-bore, turbo-charged engineer mode, fighting against the world-wide virus writing epidemic in the very early days of McAfee Associates (from 7 people to more than 120 people).  Fewer people would recognize that world as would recognize the world of a modern web site developer, so I&#8217;ll focus on the second, PayPal.</p>
<p>I remember working at PayPal during their heyday just before and just after going public, when we were fighting the &#8216;good fight&#8217; against eBay, and I&#8217;d work all hours of the day and night not because I had to, but because it was deeply, personally important to me that we win, and because the work was so deeply enthralling that I lost track of time entirely.  That everything be right, and that we be first to market with features, that our code be spectacular, that we be innovative and brilliant and FAST was our world.  And until we were bought by eBay finally (demonstrating, imo, that we had the better service), no matter the hour, I never was alone at the office.</p>
<p>When you&#8217;re part of a brilliant team, trying to honest to god change the world, it&#8217;s not about deadlines.  It&#8217;s about a form of love.  It can be thoroughly, caustically destructive to everything else in your life, but it&#8217;s an experience I would be a lesser person if I had missed.</p>
<p>If you&#8217;re clocking hours at work, and the passion of what you&#8217;re doing isn&#8217;t keeping you rooted to your chair at all hours, loving the pure joy of creating, fighting the good fight, and trying to change the world, it doesn&#8217;t mean it&#8217;s not a good job.  It&#8217;s just not THAT experience.</p>
<p>I promise, there&#8217;s far more call for people who work regular hours, meeting normal deadlines, doing solid, good work, than for those of us who burn so very, very brightly, but for so short a time.</p>
<p>If you&#8217;re really in passionate love with the work you&#8217;re doing, it&#8217;s not about working to 3am to meet a deadline.  It&#8217;s about finally reaching a temporary point of closure for the days work, and raising up your head to suddenly discover it&#8217;s 3am.</p>
<p>And if you really, truly believe in your company, and you believe in your project, and have a fire to &#8216;win&#8217; in some way (usually against a more powerful competitor) it&#8217;s not about accepting an imposed deadline that makes you work hard.  It&#8217;s about DEMANDING a deadline that makes you work hard, but that you know you can meet.  Because you know it&#8217;s important, and that every second counts, and you CARE about the company being not just first, but first with a brilliant, innovative, wonderful experience.</p>
<p>After it&#8217;s all over, it&#8217;s draining.  It&#8217;s exhausting.  It&#8217;s mind-numbing.  You feel&#8230;dead, somehow, once the work is over, and you&#8217;ve been brilliant for so long, that you feel like your brain cells have used up all their energy.  You go home, don&#8217;t show up to work for a week, recharge, find out if you still have any RL friends, do something physical (skydiving, rock climbing, hiking, etc.) to get in touch with your body again.  You come back to work eventually, and you work with others to clean up any loose ends, and slowly you get back the energy from your co-workers, and the ambience in the office, and eventually you&#8217;re back on track to start another feature that&#8217;ll knock the socks off your competitors.</p>
<p>If you&#8217;re in a company where you are the dominant force, you don&#8217;t work like that.  You don&#8217;t need to, the hunger isn&#8217;t there.</p>
<p>PayPal lost that hunger when eBay bought us.  There wasn&#8217;t anything to fight for, anymore.  We&#8217;d won, in a way, and lost in a way.  I even asked it, when eBay management had a big meeting with everybody to tell us about their vision for us.  I don&#8217;t remember if it was the meeting Meg Whitman was at, or not, but I asked something like, &#8220;We&#8217;ve been fighting eBay for all this time, and now we don&#8217;t have to.  What will replace that, to keep the drive going?&#8221;  The answer was a mealy-mouthed mess of future strategy and becoming the dominant payment platform.  It wasn&#8217;t a battle anymore, we&#8217;d become the big company.</p>
<p>I left not terribly long after that, for health reasons.  (Remember what I said about caustically destructive?  <img src='http://cyberfox.com/blog/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' />  )  But also because I didn&#8217;t feel the passion in the hallways anymore.</p>
<p>The woman who I will marry in 36 days stood by me through it despite almost never seeing me, my friends teased that they&#8217;d forgotten my name, I ended up needing major surgery for a condition I let go too long&#8230;  But I was part of one of those winning teams, fighting against terrible odds, doing brilliant work, burning so very, very brightly, and changing the world one line of code at a time.</p>
<p>I think Mark understands that, as Google has Microsoft with an unlimited war-chest bearing down on them.  From what I&#8217;ve read, I don&#8217;t think Scoble completely does get it.  He gets that passion is important (Channel 9 certainly shows that), but the fight against overwhelming odds that drives it to fevered peaks, that brings it to a different level&#8230;that&#8217;s what&#8217;s missing.</p>
<p>But it&#8217;s okay.  Even I&#8217;m working a day job these days.  I still have the intense passion to program on my own projects, doing it until 4am regularly, but I&#8217;m in a larger cycle of recharge, get in touch with life, etc., before maybe doing it again if I find the right company.  Or maybe not.  I&#8217;ll be a married man shortly, settling down in theory.  Maybe I can&#8217;t fight those fights anymore.  Maybe I should work at Microsoft.  <img src='http://cyberfox.com/blog/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' />   Just kidding&#8230;</p>
<p>&#8211;  Morgan Schweers, Cyber<b>FOX</b>!</p>
]]></content:encoded>
			<wfw:commentRss>http://cyberfox.com/blog/11-worklife-balance-versus-the-passion-of-the-code/feed</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Thoughts on remote diff&#8230;</title>
		<link>http://cyberfox.com/blog/8-thoughts-on-remote-diff</link>
		<comments>http://cyberfox.com/blog/8-thoughts-on-remote-diff#comments</comments>
		<pubDate>Sat, 12 Nov 2005 00:54:17 +0000</pubDate>
		<dc:creator>Cyberfox</dc:creator>
				<category><![CDATA[algorithms]]></category>
		<category><![CDATA[coding]]></category>
		<category><![CDATA[interview]]></category>
		<category><![CDATA[work]]></category>

		<guid isPermaLink="false">http://www.vixen.com/blog/2005/11/11/8</guid>
		<description><![CDATA[Greetings, I sat in my office today, listening to a phone interview in which the candidate was asked a number of simple problems, and then the harder design question: Given two computers linked over a slow link (say dialup), each has a 1TB file, how would you determine (1) if the two are different, and [...]]]></description>
			<content:encoded><![CDATA[<p>Greetings,<br />
I sat in my office today, listening to a phone interview in which the candidate was asked a number of simple problems, and then the harder design question:</p>
<blockquote><p>Given two computers linked over a slow link (say dialup), each has a 1TB file, how would you determine (1) if the two are different, and (2) what that difference is?</p></blockquote>
<p>Me being the gadfly that I am, I spoke with my office-mate a little about what the &#8216;right&#8217; answer should have.  He acknowledged that he was looking for a subdivision answer, which determined the files first difference location.  It&#8217;s a pretty simple answer, conceptually subdivide the file into (say) 4096 separate 256MB blocks.  Do the MD5&#8242;s on them, pass them over the wire, so each side knows the MD5&#8242;s at each 256MB step.  If there&#8217;s a change, instantly at that point you know it was in the last 256MB&#8217;s, and you can subdivide that down by 4096, so you&#8217;re MD5&#8242;ing 64K blocks.  This finds you the 64K block where the first change happened, and you can either subdivide one more time, or just transmit the 64K containing the difference and figure it out right there.</p>
<p>I pointed out that this only finds the first byte where there is a difference, and that if you wanted to know if some portion of the *rest* of the file is the same (or worse, you want to construct a minimum edit difference), you start to need to do more complex things.  I went off, and started thinking about algorithms to make the rest of the problem tractable.</p>
<p>This is what I came up with, recorded here for posterity, and in case anybody with a decent sense of algorithmics would like to correct me.</p>
<p>It&#8217;s CPU-intensive, firstly.  It&#8217;s CPU-intensive, because that&#8217;s what you&#8217;ve got.  You can&#8217;t use bandwidth, because it&#8217;s defined as too low for this, so you have to use what you&#8217;ve got, and that&#8217;s the CPU power on each side.</p>
<p>So one machine designates itself as the &#8216;compare to&#8217; server.  It then chunks the *remainder* (point from diff on) of the 1TB file into fixed size chunks.  (Say 64K chunks, which in the worst case would be 16 million of them.)  Each 64K chunk is separately MD5&#8242;ed, and the results stored on disk, hashed by their bottom two bytes into a linked-list, along with the block number they&#8217;re for.  The bottom two bytes are used as an index into a 64KBit bit-vector, and the bit at that location is set.  Once the remainder of the file is completely processed, the bit vector is transmitted.</p>
<p>The &#8216;compare from&#8217; client then creates (this is the computationally sucky part) 64K-1 simultaneous MD5&#8242;s, chunking from (diff+1) to (diff+65535).  When it&#8217;s done chunking 64K of data (all 64K MD5s will complete at the same time), they&#8217;re checked, reset, and they all start again (at diff+65536 to&#8230;).  This is defined as 64K-1 MD5s under the hope that some kind of internal MD5 optimization could be found that would prevent it from behaving as slowly as a single MD5 run on the order of (64K*(EOF-DiffPoint) times.</p>
<p>During the &#8216;check&#8217;, the bottom two bytes are used as a lookup in the bit vector.  If any MD5s match a bit, all matching ones are transmitted, and the server checks if any are a recognized block.  If any is, it returns the block number, and the next 64K MD5, which the client can quickly check by running a single MD5 on that data.  If they match, the client and server can consider the rest of the file as if it were to be checked from the first step again, because there may be more than one point of difference.  More precision can be gained by transmitting the difference data from the client.  Since we know there is a difference at the first byte (by the first algorithm), we do a backwards compare from the end of the last different block on the server versus the block before we found an MD5 match.  We then find the last byte of difference, as well as the first.  (Because of the size of the data, you could transmit it to the server, and run a more normal minimum edit difference algorithm against that block, to get more precision.)</p>
<p>As I said before, treat the rest of the file (from server block &#8216;N&#8217; and client block DIFF+n bytes to the block &#8216;N&#8217; match) as a new file to compare.  Scale down the blocksizes in this new instance, for quicker/greater precision, and do it again.</p>
<p>This should result in a decent minimum edit difference &#8216;collection&#8217; on the server (or whichever side aggregates this data), without transmitting the entire 1TB of data over the wire.  These numbers can be tuned, e.g. the two-byte bit vector can become 3 bytes, and still only takes 2MB up front to transmit the bit-vector, but substantially cuts down the likelihood of having sub-hash matches, and thus fewer packets sent later.</p>
<p>Now, a real problem comes if 1 byte in every 64K is modified on the client.  The problem is that it&#8217;ll never match another block.  The CPU load becomes pretty excessive at this point, effectively doing 64K MD5s of a 1TB(-64K) file.</p>
<p>At every block size, this problem remains, unfortunately, and with a file as large as 1TB, you really want to start with large block sizes.  If, for instance, every 64th byte is altered, you&#8217;re pretty much doomed as that&#8217;s the blocksize for MD5 and you&#8217;ll never get a full match, and at that point in order to get a full edit difference between the two files, you will need to transmit one of them to the other.</p>
<p>So this approach is basically restricted to when you have an idea how many changes (or expect that the answer should be zero) have been made.</p>
<p>The original interview question, however, is even more restricted in that it solely finds the first location of a change, and does not identify what kind of change it is, which is in my book a not terribly useful answer.</p>
<p>&#8211;  Morgan Schweers, Cyber<b>FOX</b>!</p>
]]></content:encoded>
			<wfw:commentRss>http://cyberfox.com/blog/8-thoughts-on-remote-diff/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Filing a toilet ticket&#8230;</title>
		<link>http://cyberfox.com/blog/6-filing-a-toilet-ticket</link>
		<comments>http://cyberfox.com/blog/6-filing-a-toilet-ticket#comments</comments>
		<pubDate>Mon, 26 Sep 2005 22:37:24 +0000</pubDate>
		<dc:creator>Cyberfox</dc:creator>
				<category><![CDATA[work]]></category>

		<guid isPermaLink="false">http://www.vixen.com/blog/?p=6</guid>
		<description><![CDATA[Wherein our hero learns he has to file a ticket to fix a toilet...]]></description>
			<content:encoded><![CDATA[<p>Greetings,<br />
I work for a major ecommerce corporation.  I&#8217;m not used to that; I&#8217;ve usually worked for much smaller companies.</p>
<p>So today, when I found the toilet not working quite right, I tried to call the operator to find out who I should call, and maybe get transferred to them.</p>
<p>They told me to &#8216;File a <a href="http://www.remedy.com">Remedy</a> ticket.&#8217;  Now Remedy is an absurdly complex application which tracks trouble tickets, and is generally used to track IT issues.  They are, indeed, also using it to track plumbing issues.  I cannot imagine who decided to swat this mosquito with this nuclear weapon, but the scariest part to me is that I now am absolutely confident that even the <em>plumbers</em> have SLA&#8217;s (Service Level Agreement) they have to meet.</p>
<p>Another sign of the ossification of a company: The focus on &#8216;process&#8217; as the means of getting things done.</p>
<p>I take this in stride, however, because this was a Very Good Weekend for me.  I&#8217;ll get to that in a bit.</p>
<p>&#8211;  Morgan Schweers, CyberFOX!</p>
]]></content:encoded>
			<wfw:commentRss>http://cyberfox.com/blog/6-filing-a-toilet-ticket/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

