<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>CyberFOX Software Inc. &#187; algorithms</title>
	<atom:link href="http://cyberfox.com/blog/category/algorithms/feed" rel="self" type="application/rss+xml" />
	<link>http://cyberfox.com/blog</link>
	<description>Coding, Connections, and Other Bloggy Bits of Goodness</description>
	<lastBuildDate>Sat, 04 Feb 2012 10:21:21 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1.1</generator>
		<item>
		<title>My response to eBay&#8217;s Bid Assistant&#8230;</title>
		<link>http://cyberfox.com/blog/29-my-response-to-ebays-bid-assistant</link>
		<comments>http://cyberfox.com/blog/29-my-response-to-ebays-bid-assistant#comments</comments>
		<pubDate>Sat, 19 May 2007 03:37:31 +0000</pubDate>
		<dc:creator>Cyberfox</dc:creator>
				<category><![CDATA[algorithms]]></category>
		<category><![CDATA[ebay]]></category>
		<category><![CDATA[jbidwatcher]]></category>
		<category><![CDATA[sniping]]></category>

		<guid isPermaLink="false">http://www.vixen.com/blog/2007/05/18/29</guid>
		<description><![CDATA[Greetings, eBay recently launched their Bid Assistant, which acts similarly to JBidwatcher&#8217;s Multisniping feature, except without the sniping. None the less, it&#8217;s good to see them adding features JBidwatcher had six years ago. Being serious for a moment, I&#8217;m actually really happy to see them do this. It&#8217;s a straightforward feature, and one they should [...]]]></description>
			<content:encoded><![CDATA[<p>Greetings,</p>
<p><a title="eBay!" target="_blank" href="http://rover.ebay.com/rover/1/711-1751-2978-71/1?AID=5463217&#038;PID=2430448&#038;mpre=http%3A%2F%2Fwww.ebay.com">eBay</a> recently launched their <a title="eBay&#039;s new Bid Assistant feature" target="_blank" href="http://rover.ebay.com/rover/1/711-1751-2978-71/1?AID=5463217&#038;PID=2430448&#038;mpre=http%3A%2F%2Fpages.ebay.com%2Fhelp%2Fbuy%2Fbid-assistant.html">Bid Assistant</a>, which acts similarly to JBidwatcher&#8217;s Multisniping feature, except without the sniping.</p>
<p>None the less, it&#8217;s good to see them adding features JBidwatcher had six years ago.  <img src='http://cyberfox.com/blog/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </p>
<p>Being serious for a moment, I&#8217;m actually really happy to see them do this.  It&#8217;s a straightforward feature, and one they should have had long since.  If they make it easy to use it&#8217;ll increase bid volumes, and thereby their end revenue.  Speaking as an ex-employee, and a shareholder, this is great!</p>
<p>I always considered multisniping a &#8216;good for eBay&#8217; feature, because it meant one bid could get applied to a number of items, without the user really intervening after the initial setup.</p>
<p>It&#8217;s also a hint that they might still be putting new ideas into their platform, which I&#8217;m very happy to see.  It also ups the ante for programs like mine, to add more extensive algorithmic bidding.  (e.g., &#8216;If I win this, THEN put a snipe on that, because if I can combine shipping I&#8217;d want them both&#8230;&#8217;)</p>
<p>What it is not, is sniping.  <a title="The last sentence of the third bullet point" target="_blank" href="http://rover.ebay.com/rover/1/711-1751-2978-71/1?AID=5463217&#038;PID=2430448&#038;mpre=http%3A%2F%2Fpages.ebay.com%2Fhelp%2Fbuy%2Fbid-assistant.html#status">Specifically</a>:<br />
<em>You cannot schedule bids to be placed at a specific time.</em><br />
The first version of JBidwatcher that included <a title="My multisniping guide." target="_blank" href="http://www.jbidwatcher.com/help/multisniping.shtml">Multisniping</a> was December 16, 2001, and is the earliest implementation of bid groups / multisniping / bid assistant functionality that I know of.  It&#8217;s not world changing, but I&#8217;m proud of it.  <img src='http://cyberfox.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>Best of luck with your auctions!<span class="sg"></p>
<p></span><span class="sg">&#8211;  Morgan Schweers, CyberFOX!</span></p>
]]></content:encoded>
			<wfw:commentRss>http://cyberfox.com/blog/29-my-response-to-ebays-bid-assistant/feed</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Thoughts on remote diff&#8230;</title>
		<link>http://cyberfox.com/blog/8-thoughts-on-remote-diff</link>
		<comments>http://cyberfox.com/blog/8-thoughts-on-remote-diff#comments</comments>
		<pubDate>Sat, 12 Nov 2005 00:54:17 +0000</pubDate>
		<dc:creator>Cyberfox</dc:creator>
				<category><![CDATA[algorithms]]></category>
		<category><![CDATA[coding]]></category>
		<category><![CDATA[interview]]></category>
		<category><![CDATA[work]]></category>

		<guid isPermaLink="false">http://www.vixen.com/blog/2005/11/11/8</guid>
		<description><![CDATA[Greetings, I sat in my office today, listening to a phone interview in which the candidate was asked a number of simple problems, and then the harder design question: Given two computers linked over a slow link (say dialup), each has a 1TB file, how would you determine (1) if the two are different, and [...]]]></description>
			<content:encoded><![CDATA[<p>Greetings,<br />
I sat in my office today, listening to a phone interview in which the candidate was asked a number of simple problems, and then the harder design question:</p>
<blockquote><p>Given two computers linked over a slow link (say dialup), each has a 1TB file, how would you determine (1) if the two are different, and (2) what that difference is?</p></blockquote>
<p>Me being the gadfly that I am, I spoke with my office-mate a little about what the &#8216;right&#8217; answer should have.  He acknowledged that he was looking for a subdivision answer, which determined the files first difference location.  It&#8217;s a pretty simple answer, conceptually subdivide the file into (say) 4096 separate 256MB blocks.  Do the MD5&#8242;s on them, pass them over the wire, so each side knows the MD5&#8242;s at each 256MB step.  If there&#8217;s a change, instantly at that point you know it was in the last 256MB&#8217;s, and you can subdivide that down by 4096, so you&#8217;re MD5&#8242;ing 64K blocks.  This finds you the 64K block where the first change happened, and you can either subdivide one more time, or just transmit the 64K containing the difference and figure it out right there.</p>
<p>I pointed out that this only finds the first byte where there is a difference, and that if you wanted to know if some portion of the *rest* of the file is the same (or worse, you want to construct a minimum edit difference), you start to need to do more complex things.  I went off, and started thinking about algorithms to make the rest of the problem tractable.</p>
<p>This is what I came up with, recorded here for posterity, and in case anybody with a decent sense of algorithmics would like to correct me.</p>
<p>It&#8217;s CPU-intensive, firstly.  It&#8217;s CPU-intensive, because that&#8217;s what you&#8217;ve got.  You can&#8217;t use bandwidth, because it&#8217;s defined as too low for this, so you have to use what you&#8217;ve got, and that&#8217;s the CPU power on each side.</p>
<p>So one machine designates itself as the &#8216;compare to&#8217; server.  It then chunks the *remainder* (point from diff on) of the 1TB file into fixed size chunks.  (Say 64K chunks, which in the worst case would be 16 million of them.)  Each 64K chunk is separately MD5&#8242;ed, and the results stored on disk, hashed by their bottom two bytes into a linked-list, along with the block number they&#8217;re for.  The bottom two bytes are used as an index into a 64KBit bit-vector, and the bit at that location is set.  Once the remainder of the file is completely processed, the bit vector is transmitted.</p>
<p>The &#8216;compare from&#8217; client then creates (this is the computationally sucky part) 64K-1 simultaneous MD5&#8242;s, chunking from (diff+1) to (diff+65535).  When it&#8217;s done chunking 64K of data (all 64K MD5s will complete at the same time), they&#8217;re checked, reset, and they all start again (at diff+65536 to&#8230;).  This is defined as 64K-1 MD5s under the hope that some kind of internal MD5 optimization could be found that would prevent it from behaving as slowly as a single MD5 run on the order of (64K*(EOF-DiffPoint) times.</p>
<p>During the &#8216;check&#8217;, the bottom two bytes are used as a lookup in the bit vector.  If any MD5s match a bit, all matching ones are transmitted, and the server checks if any are a recognized block.  If any is, it returns the block number, and the next 64K MD5, which the client can quickly check by running a single MD5 on that data.  If they match, the client and server can consider the rest of the file as if it were to be checked from the first step again, because there may be more than one point of difference.  More precision can be gained by transmitting the difference data from the client.  Since we know there is a difference at the first byte (by the first algorithm), we do a backwards compare from the end of the last different block on the server versus the block before we found an MD5 match.  We then find the last byte of difference, as well as the first.  (Because of the size of the data, you could transmit it to the server, and run a more normal minimum edit difference algorithm against that block, to get more precision.)</p>
<p>As I said before, treat the rest of the file (from server block &#8216;N&#8217; and client block DIFF+n bytes to the block &#8216;N&#8217; match) as a new file to compare.  Scale down the blocksizes in this new instance, for quicker/greater precision, and do it again.</p>
<p>This should result in a decent minimum edit difference &#8216;collection&#8217; on the server (or whichever side aggregates this data), without transmitting the entire 1TB of data over the wire.  These numbers can be tuned, e.g. the two-byte bit vector can become 3 bytes, and still only takes 2MB up front to transmit the bit-vector, but substantially cuts down the likelihood of having sub-hash matches, and thus fewer packets sent later.</p>
<p>Now, a real problem comes if 1 byte in every 64K is modified on the client.  The problem is that it&#8217;ll never match another block.  The CPU load becomes pretty excessive at this point, effectively doing 64K MD5s of a 1TB(-64K) file.</p>
<p>At every block size, this problem remains, unfortunately, and with a file as large as 1TB, you really want to start with large block sizes.  If, for instance, every 64th byte is altered, you&#8217;re pretty much doomed as that&#8217;s the blocksize for MD5 and you&#8217;ll never get a full match, and at that point in order to get a full edit difference between the two files, you will need to transmit one of them to the other.</p>
<p>So this approach is basically restricted to when you have an idea how many changes (or expect that the answer should be zero) have been made.</p>
<p>The original interview question, however, is even more restricted in that it solely finds the first location of a change, and does not identify what kind of change it is, which is in my book a not terribly useful answer.</p>
<p>&#8211;  Morgan Schweers, Cyber<b>FOX</b>!</p>
]]></content:encoded>
			<wfw:commentRss>http://cyberfox.com/blog/8-thoughts-on-remote-diff/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

