Ultimate Web Site Drop Down Menu Forum

Ultimate Web Site Drop Down Menu Forum (http://www.udm4.com/forum/index.php)
-   General Web Trends and News (http://www.udm4.com/forum/forumdisplay.php?f=10)
-   -   How Mailinator Compresses Its Email Stream By 90% (http://www.udm4.com/forum/showthread.php?t=8215)

02-21-2012 09:32 PM

How Mailinator Compresses Its Email Stream By 90%
 
</img>
</img>
An anonymous reader writes "Paul Tyma, creator of Mailinator, writes about a greedy algorithm to analyze the huge amount of email Mailinator receives and finds ways to reduce its memory footprint by 90%. Quoting: 'I grabbed a few hundred megs of the Mailinator stream and ran it through several compressors. Mostly just stuff I had on hand 7z, bzip, gzip, etc. Venerable zip reduced the file by 63%. Not bad. Then I tried the LZMA/2 algorithm (7z) which got it down by 85%! Well. OK! Article is over! Everyone out! 85% is good enough. Actually &mdash; there were two problems with that result. One was that, LZMA, like many compression algorithms build their dictionary based on a fixed dataset. As it compresses it builds a dictionary of common sequences and improves and uses that dictionary to compress everything thereafter. That works great on static files &mdash; but Mailinator is not a static file. Its a big, honking, several gigabyte cache of ever changing email. If I compressed a million emails, and then some user wanted to read email #502,922 &mdash; I'd have to "seek" through the preceding half-million or so to build the dictionary in order to decompress it. That's probably not feasible.'"

Read more of this story at Slashdot.


More...


All times are GMT. The time now is 08:31 AM.

Powered by vBulletin® Version 3.0.1
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.