1brc - my implementations for the billion row challenge

diff options

author	Olivier Bourgain <olivierbourgain02@gmail.com>	2024-01-07 20:15:53 +0100
committer	GitHub <noreply@github.com>	2024-01-07 20:15:53 +0100
commit	143132e8dff054708f9695229d8ec3b3e2344246 (patch)
tree	3b22fad48b341b9a6e41deca685a9c4eb7538b0f /src/main/java/dev/morling/onebrc/CalculateAverage_yehwankim23.java
parent	2bb44311064d84e0aa01af9b89605349afcc0de4 (diff)

My implementation is in dev.morling.onebrc.CalculateAverage_obourgain and runnable with provided script calculate_average_obourgain.sh (#75)

Runs with standard JDK 21. On my computers (i5 13500, 20 cores, 32GB ram) my best run is (file fully in page cache): 49.78user 0.69system 0:02.81elapsed 1795%CPU A bit older version of the code on Mac pro M1 32 GB: real 0m2.867s user 0m23.956s sys 0m1.329s As I wrote in comments in the code, I have a few different roundings that the reference implementation. I have seend that there is an issue about that, but no specific rule yet. Main points: - use MemorySegment, it's faster than ByteBuffer - split the work in a lot of chunks and distribute to a thread pool - fast measurement parser by using a lot of domain knowledge - very low allocation - visit each byte only once Things I tried that were in fact pessimizations: - use some internal JDK code to vectorize the hashCode computation - use a MemorySegment to represent the keys instead of byte[], to avoid copying Hope I won't have a bad surprise when running on the target server 😱

Diffstat (limited to 'src/main/java/dev/morling/onebrc/CalculateAverage_yehwankim23.java')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: