From 143132e8dff054708f9695229d8ec3b3e2344246 Mon Sep 17 00:00:00 2001 From: Olivier Bourgain Date: Sun, 7 Jan 2024 20:15:53 +0100 Subject: My implementation is in dev.morling.onebrc.CalculateAverage_obourgain and runnable with provided script calculate_average_obourgain.sh (#75) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Runs with standard JDK 21. On my computers (i5 13500, 20 cores, 32GB ram) my best run is (file fully in page cache): 49.78user 0.69system 0:02.81elapsed 1795%CPU A bit older version of the code on Mac pro M1 32 GB: real 0m2.867s user 0m23.956s sys 0m1.329s As I wrote in comments in the code, I have a few different roundings that the reference implementation. I have seend that there is an issue about that, but no specific rule yet. Main points: - use MemorySegment, it's faster than ByteBuffer - split the work in a lot of chunks and distribute to a thread pool - fast measurement parser by using a lot of domain knowledge - very low allocation - visit each byte only once Things I tried that were in fact pessimizations: - use some internal JDK code to vectorize the hashCode computation - use a MemorySegment to represent the keys instead of byte[], to avoid copying Hope I won't have a bad surprise when running on the target server 😱 --- calculate_average_obourgain.sh | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) create mode 100755 calculate_average_obourgain.sh (limited to 'calculate_average_obourgain.sh') diff --git a/calculate_average_obourgain.sh b/calculate_average_obourgain.sh new file mode 100755 index 0000000..67c91b3 --- /dev/null +++ b/calculate_average_obourgain.sh @@ -0,0 +1,31 @@ +#!/bin/sh +# +# Copyright 2023 The original authors +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# runs with -Xmx24m on my machine, playing it safe with a larger heap +JAVA_OPTS="-Xmx64m --enable-preview" +# to use some black magic options +JAVA_OPTS="$JAVA_OPTS -XX:+UnlockExperimentalVMOptions" +# no GC, not needed +JAVA_OPTS="$JAVA_OPTS -XX:+UseEpsilonGC -XX:+AlwaysPreTouch" +# my finals are really final +JAVA_OPTS="$JAVA_OPTS -XX:+TrustFinalNonStaticFields" +# to get CalculateAverage_obourgain$OpenAddressingMap::getOrCreate to inline. A compile command wasn't enough, it was still hitting 'already compiled into a big method' +JAVA_OPTS="$JAVA_OPTS -XX:InlineSmallCode=10000" +# seems to be a bit faster +JAVA_OPTS="$JAVA_OPTS -XX:-TieredCompilation -XX:CICompilerCount=2 -XX:CompileThreshold=1000" + +time java $JAVA_OPTS --class-path target/average-1.0.0-SNAPSHOT.jar dev.morling.onebrc.CalculateAverage_obourgain -- cgit v1.2.3