aboutsummaryrefslogtreecommitdiff
path: root/src/main/java/dev/morling/onebrc/CalculateAverage_thomaswue.java
Commit message (Collapse)AuthorAgeFilesLines
* One last improvement for thomaswue (#702)Thomas Wuerthinger2024-02-011-61/+66
| | | | | | | | | * Combine <8 and 8-16 cases into one case. * Adopt mask-based approach for the <16 length city fast path (idea of Van Phu Do). * Slightly improved code layout. * Update perf number.
* Added comments to used flags, clean up code, final fine tuning. (#674)Thomas Wuerthinger2024-01-311-134/+100
|
* Clean up, fine tuning, credit section for thomaswue (#646)Thomas Wuerthinger2024-01-291-143/+126
| | | | | | | | | * Some clean up, fine tuning, removing non-supported options, added credit section and additional comments. * Put license header year back to 2023 to pass checks. * Remove static linking (as it requires some more setup on the target machine).
* Some fine tuning for thomaswue (#606)Thomas Wuerthinger2024-01-281-152/+239
| | | | | | * Some fine tuning. * Process 2MB segments to make all threads finish at the same time. Process with 3 scanners in parallel in the same thread.
* Tuning and subprocess spawn for thomaswue (#533)Thomas Wuerthinger2024-01-211-70/+97
| | | | | | | | | | | | | * Some clean up, small-scale tuning, and reduce complexity when handling longer names. * Do actual work in worker subprocess. Main process returns immediately and OS clean up of the mmap continues in the subprocess. * Update minor Graal version after CPU release. * Turn GC back to epsilon GC (although it does not seem to make a difference). * Minor tuning for another +1%.
* Improve scheduling for thomaswue (#358)Thomas Wuerthinger2024-01-151-16/+35
| | | | | * Improve scheduling for another 6%. * Tune hash function and collision handling.
* Adding Scanner object and also tuning for better branch prediction for about ↵Thomas Wuerthinger2024-01-121-101/+182
| | | | +6%. (#341)
* Second tuning for thomaswueThomas Wuerthinger2024-01-101-96/+103
| | | | | | | | | | | | | | | * Optimize checking for collisions by doing this a long at a time always. * Use a long at a time scanning for delimiter. * Minor tuning. Now below 0.80s on Intel i9-13900K. * Add number parsing code from Quan Anh Mai. Fix name length issue. * Include suggestion from Alfonso Peterssen for another 1.5%. * Optimize hash collision check compare for ~4% gain. * Add perf stats based on latest version.
* Use SIMD for search for delimiter and name compareThomas Wuerthinger2024-01-071-52/+100
|
* Initial version for thomaswue with Oracle GraalVM Native ImageThomas Wuerthinger2024-01-061-0/+212
* Initial version. * Make PGO feature optional off-by-default. Needs PGO_MODE environment variable to be set. Add -O3 -march=native tuning flags for better performance. * Adjust script to be more quiet. * Adjust max city length. Fix an issue when accumulating results. * Tune thomaswue submission. mmap the entire file, use Unsafe directly instead of ByteBuffer, avoid byte[] copies. These tricks give a ~30% speedup, over an already fast implementation. * Optimize parsing of numbers based on specific given constraints. * Fix for segment calculation for case of very small input. * Minor shell script fixes. * Separate out build step into file additional_build_step_thomaswue.sh, simplify run script and remove PGO option for now. * Minor corrections to the run script. --------- Co-authored-by: Alfonso² Peterssen <alfonso.peterssen@oracle.com>