-
The input: 3 billion lines, 30GB compressed, 425GB uncompressed.
-
The rig: quad-core A8-7600, 16GB ram, 256GB 850 Pro SSD with 90GB available.
-
The algorithm: a simple merge sort, predicted to finish in 10.2 hours. (Actual time, 9.5 hours.)
-
The output: 25.2GB compressed (16% smaller) with 5 million duplicate lines removed.
|
|