Old 02-21-2012, 09:32 PM
sl4shd0t
Posts: n/a
  #1  
Default How Mailinator Compresses Its Email Stream By 90%

</img>
</img>
An anonymous reader writes "Paul Tyma, creator of Mailinator, writes about a greedy algorithm to analyze the huge amount of email Mailinator receives and finds ways to reduce its memory footprint by 90%. Quoting: 'I grabbed a few hundred megs of the Mailinator stream and ran it through several compressors. Mostly just stuff I had on hand 7z, bzip, gzip, etc. Venerable zip reduced the file by 63%. Not bad. Then I tried the LZMA/2 algorithm (7z) which got it down by 85%! Well. OK! Article is over! Everyone out! 85% is good enough. Actually &mdash; there were two problems with that result. One was that, LZMA, like many compression algorithms build their dictionary based on a fixed dataset. As it compresses it builds a dictionary of common sequences and improves and uses that dictionary to compress everything thereafter. That works great on static files &mdash; but Mailinator is not a static file. Its a big, honking, several gigabyte cache of ever changing email. If I compressed a million emails, and then some user wanted to read email #502,922 &mdash; I'd have to "seek" through the preceding half-million or so to build the dictionary in order to decompress it. That's probably not feasible.'"

Read more of this story at Slashdot.


More...
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


All times are GMT. The time now is 10:42 AM.


Powered by vBulletin® Version 3.0.1
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.