GZ-Sort

A utility for sorting really big files.

Comments are moderated. It may take a few minutes before your comment appears.
Markdown is supported in your comments.

Where entropy is a fudge factor between 1.5 and 3 for how unsorted the source data is. The Freebase dataset appeared to be well-grouped locally, so 1.5 seems appropriate. Plugging everything in gives 20.4 hours unthreaded. Threading appears to scale with the square root of the number of threads and using 4 threads cuts the run time in half, to 10.2 hours. The measured run time was 9.5 hours, 6.8% off of the estimate.

Name:
Mail: (not shown)

Please type this: