1brc - my implementations for the billion row challenge

diff options

author	Roman Musin <995612+roman-r-m@users.noreply.github.com>	2024-01-11 11:29:08 +0000
committer	GitHub <noreply@github.com>	2024-01-11 12:29:08 +0100
commit	4b3f959812dfe3387c584a6b8371cf4aa21c452b (patch)
tree	5d6569a16259b63436c486b08a577812597fce17 /tocsv.sh
parent	52c490cc24505640497805b1ec1bfd98273cc512 (diff)

First version - roman_r_m (#193)

* initial commit * - use loop - use mutable object to store results * get rid of regex * Do not allocate measurement objects * MMap + custom double parsing ~ 1:30 (down from ~ 2:05) * HashMap for accumulation and only sort at the end - 1:05 * MMap the whole file * Use graal * no GC * Store results in an array list to avoid double map lookup * Adjust max buf size * Manual parsing number to long * Add --enable-preview * remove buffer size check (has no effect on performance) * fix min & max initialization * do not check for \r * Revert "do not check for \r" This reverts commit 9da1f574bf6261ea49c353488d3b4673cad3ce6e. * Optimise parsing. Now completes in 31 sec down from ~43 * trying to parse numbers faster * use open address hash table instead of the standard HashMap * formatting * Rename the script to match github username (change underscores to slashes) Enable transparent huge pages, seems to improve by ~2 sec * Revert "formatting" This reverts commit 4e90797d2a729ed7385c9000c85cc7e87d935f96. * Revert "use open address hash table instead of the standard HashMap" This reverts commit c784b55f61e48f548b2623e5c8958c9b283cae14. * add prepare_roman-r-m.sh * SWAR tricks to find semicolon (-2 seconds ro run time) * remove time call * fix test * Parallel version (~6.5 seconds)

Diffstat (limited to 'tocsv.sh')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: