1brc - my implementations for the billion row challenge

	Commit message (Collapse)	Author	Age	Files	Lines
*	processing three at once (#626)	Artsiom Korzun	2024-01-28	1	-68/+94
\|
*	shipilev: Amendments to version 4 (#627)	Aleksey Shipilëv	2024-01-28	1	-60/+96
\| \| \| \| \|	* Amendments * One more locality touchup: no need to carry the entire name array
*	jerrinot - running out of ideas (#631)	Jaromir Hamala	2024-01-28	1	-19/+14
\| \| \| \| \|	* another shameless copycat from thomas: less safepoints * I have no idea what I am doing
*	Reduce preferred vector size (#622)	Dr Ian Preston	2024-01-28	1	-5/+5
\| \| \|	Co-authored-by: Ian Preston <ianopolous@protonmail.com>
*	Version 4 (#183)	Aleksey Shipilëv	2024-01-28	1	-0/+649
\|
*	Mahadev virtual thread 1brc (#611)	Mahadev K	2024-01-28	1	-0/+152
\| \| \| \| \| \| \| \| \| \| \| \| \|	* Read file with multiple virtual threads and process chunks of file data in parallel. * Updated logic to bucket every chunk of aggs into a vector and merge them into a TreeMap for printing. * Virtual Thread / File Channels Impl. * Renamed files with GHUsername. * Added statement to get vals before updating. * Added executable permission to the files.
*	Dkarampi solution (#614)	Dimitris Karampinas	2024-01-28	1	-0/+260
\| \| \| \| \| \| \| \| \|	* Simple multi-threaded version * Format code * Formatted code * More formatting
*	Use native type, remove lots of type conversions (#618)	Van Phu DO	2024-01-28	1	-80/+99
\| \| \| \| \| \| \|	* less type conversion, less string cast * adjust some comments * fixed format issue
*	Bytesfellow initial submittion (#619)	Aleksei	2024-01-28	1	-0/+557
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Latest snapshot (#1) preparing initial version * Improved performance to 20seconds (-9seconds from the previous version) (#2) improved performance a bit * Improved performance to 14 seconds (-6 seconds) (#3) improved performance to 14 seconds * sync branches (#4) * initial commit * some refactoring of methods * some fixes for partitioning * some fixes for partitioning * fixed hacky getcode for utf8 bytes * simplified getcode for partitioning * temp solution with syncing * temp solution with syncing * new stream processing * new stream processing * some improvements * cleaned stuff * run configuration * round buffer for the stream to pages * not using compute since it's slower than straightforward get/put. using own byte array equals. * using parallel gc * avoid copying bytes when creating a station object * formatting * Copy less arrays. Improved performance to 12.7 seconds (-2 seconds) (#5) * initial commit * some refactoring of methods * some fixes for partitioning * some fixes for partitioning * fixed hacky getcode for utf8 bytes * simplified getcode for partitioning * temp solution with syncing * temp solution with syncing * new stream processing * new stream processing * some improvements * cleaned stuff * run configuration * round buffer for the stream to pages * not using compute since it's slower than straightforward get/put. using own byte array equals. * using parallel gc * avoid copying bytes when creating a station object * formatting * some tuning to increase performance * some tuning to increase performance * avoid copying data; fast hashCode with slightly more collisions * avoid copying data; fast hashCode with slightly more collisions * cleanup (#6) * tidy up
*	Some fine tuning for thomaswue (#606)	Thomas Wuerthinger	2024-01-28	1	-152/+239
\| \| \| \| \| \|	* Some fine tuning. * Process 2MB segments to make all threads finish at the same time. Process with 3 scanners in parallel in the same thread.
*	anestoruk submission (#617)	Andrzej Nestoruk	2024-01-28	1	-0/+193
\| \| \| \| \|	* initial implementation * few improvements and a cleanup (down to ~12s)
*	use long for string equals (#613)	John Ziamos	2024-01-28	1	-8/+11
\| \| \|	use more generic hashcode
*	Initial submission for jonathan_aotearoa. (#586)	Jonathan Wright	2024-01-28	1	-0/+587
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Initial submission for jonathan_aotearoa * Fixing typos * Adding hyphens to prepare and calculate shell scripts so that they're aligned with my GitHub username. * Making chunk processing more robust in attempt to fix the cause of the build error. * Fixing typo. * Fixed the handling of files less than 8 bytes in length. * Additional assertion, comment improvements. * Refactoring to improve testability. Additional assertion and comments. * Updating collision checking to include checking if the station name is equal. * Minor refactoring to make param ordering consistent. * Adding a custom toString method for the results map. * Fixing collision checking bug * Fixing rounding bug. * Fixing collision bug. --------- Co-authored-by: jonathan <jonathan@example.com>
*	serkan-ozal's 2nd submission with some minor improvements: (#612)	Serkan ÖZAL	2024-01-28	1	-55/+72
\| \| \| \|	- use shared memory arena and region between worker threads - reduce number of instructions slightly while processing file region
*	jerrinot's improvement (#607)	Jaromir Hamala	2024-01-28	1	-111/+167
\| \| \| \| \| \| \| \| \| \|	* some random changes with minimal, if any, effect * use munmap() trick credit: thomaswue * some smaller tweaks * use native image
*	CalculateAverage_pdrakatos (#515)	PanosDR	2024-01-28	1	-0/+244
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* CalculateAverage_pdrakatos * Rename to be valid with rules * CalculateAverage_pdrakatos * Rename to be valid with rules * Changes on scripts execution * Fixing bugs causing scripts not to be executed * Changes on prepare make it compatible * Fixing passing all tests * Increase direct memory allocation buffer * Fixing memory problem causes heap space exception
*	Second version by albertoventurini (#609)	Alberto Venturini	2024-01-28	1	-30/+59
\| \| \| \| \| \| \| \| \|	* Contribution by albertoventurini * Use byte arrays of size 2^20 --------- Co-authored-by: Alberto Venturini <alberto.venturini@accso.de>
*	serkan-ozal: Initial impl (#553)	Serkan ÖZAL	2024-01-28	1	-0/+674
\| \| \| \| \| \| \| \| \| \| \|	* Initial impl * Fix bad file descriptor error in the `calculate_average_serkan-ozal.sh` * Disable Epsilon GC and rely on default GC. Because apparently, JIT and Epsilon GC don't play well together in the eval machine for short lived Vector API's `ByteVector` objects * Take care of byte order before processing key length with bit shift operators * Fix key equality check for long keys
*	Simplify dedupeStation() (#589)	Dr Ian Preston	2024-01-27	1	-16/+10
\| \| \| \| \|	13.8s locally now. Co-authored-by: Ian Preston <ianopolous@protonmail.com>
*	Use LinkedBlockingQueue to process results - based on thomaswue (#603)	tivrfoa	2024-01-27	1	-0/+431
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	/** * Solution based on thomaswue solution, commit: * commit d0a28599c293d3afe3291fc3cf169a7b25ae9ae6 * Author: Thomas Wuerthinger <thomas.wuerthinger@oracle.com> * Date: Sun Jan 21 20:13:48 2024 +0100 * * Changes: * 1) Use LinkedBlockingQueue to store partial results, that * will then be merged into the final map later. * As different chunks finish at different times, this allows * to process them as they finish, instead of joining the * threads sequentially. * This change seems more useful for the 10k dataset, as the * runtime difference of each chunk is greater. * 2) Use only 4 threads if the file is >= 14GB. * This showed much better results on my local test, but I only * run with 200 million rows (because of limited RAM), and I have * no idea how it will perform on the 1brc HW. */
*	(new submission) melgenek: ~top 15 on 10k. Buffered IO, VarHandles, vectors, ↵	Yevhenii Melnyk	2024-01-27	1	-0/+549
\| \| \| \| \| \| \|	custom hashtable (#600) * melgenek: ~top 15 on 10k. Buffered IO, VarHandles, vectors, custom hashtable * Calculate the required heap size dynamically
*	Fix hash code collisions (#605)	Jairo Graterón	2024-01-27	1	-79/+77
\| \| \| \| \| \| \| \| \| \| \| \| \|	* fix test rounding, pass 10K station names * improved integer conversion, delayed string creation. * new algorithm hash, use ConcurrentHashMap * fix rounding test * added the length of the string in the hash initialization. * fix hash code collisions
*	Reading 1B row file using Java NIO lib. (#601)	Manish Garg	2024-01-27	1	-0/+169
\|
*	Larger heap, small tweaks (#593)	Roy van Rijn	2024-01-27	1	-34/+50
\| \| \|	More small tweaks, perf from 775~ to 738~
*	1BRC gigiblender (#595)	Florin Blanaru	2024-01-27	1	-0/+501
\| \| \| \| \|	* Dirty implementation gigiblender * Final impl gigiblender
*	Next version (#596)	Roman Musin	2024-01-27	1	-29/+49
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* cleanup prepare script * native image options * fix quardaric probing (no change to perf) * mask to get the last chunk of the name * extract hash functions * tweak the probing loop (-100ms) * fiddle with native image options * Reorder conditions in hope it makes branch predictor happier * extracted constant
*	improve hard disk access locality, another 8% (#591)	Van Phu DO	2024-01-27	1	-155/+172
\| \| \| \| \| \| \|	* improve hard disk access locality, another 8% * add some comments & credit * fixed format
*	First attempt with Java-managed concurrency (#590)	Hieu Dao Quang	2024-01-27	1	-0/+117
\| \| \|	Co-authored-by: Quang Hieu Dao <hieu_dq@flinters.vn>
*	Initial submission (#588)	rcasteltrione	2024-01-27	1	-0/+309
\| \| \| \| \|	* Initial submission * fixed not executable scripts
*	Fix hundredwatt's entry on 10k dataset (#558)	Jason Nochlin	2024-01-26	1	-8/+13
\| \| \| \| \| \| \| \| \| \| \| \| \|	* Improve hash function * remove limit on number of cores * fix calculation of boundaries between chunks * fix IOOBE --------- Co-authored-by: Jason Nochlin <hundredwatt@users.noreply.github.com>
*	CalculateAverage_gonix update (#579)	gonix	2024-01-25	1	-23/+21
\| \| \|	Minor updates here and there, shaves off ~5% of execution time on my machine.
*	Contribution by albertoventurini (#578)	Alberto Venturini	2024-01-25	1	-0/+299
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Contribution by albertoventurini * Shave off a couple of hundreds of milliseconds, by making an assumption on temperature readings * Parse reading without loop, inspired by other solutions * Use all cores * Small improvements, only allocate 247 positions instead of 256 --------- Co-authored-by: Alberto Venturini <alberto.venturini@accso.de>
*	Updates for gamlerhart: Simpler & Faster (#580)	Roman Stoffel	2024-01-25	1	-131/+86
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Update with Rounding Bugfix * Simplification of Merging Results * More Plain Java Code for Value Storage * Improve Performance by Stupid Hash Drop around 3 seconds on my machine by simplifying the hash to be ridicules stupid, but faster. * Fix outdated comment
*	Second submission to keep a bit of dignity (#581)	Dmitry Bufistov	2024-01-25	1	-200/+182
\| \| \| \| \| \| \| \| \|	* Dmitry challenge * Dmitry submit 2. Use MemorySegment of FileChannle and Unsafe to read bytes from disk. 4 seconds speedup in local test from 20s to 16s.
*	tonivade improved solution (#582)	Antonio Muñoz	2024-01-25	1	-128/+146
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* tonivade improved not using HashMap * use java 21.0.2 * same hash same station * remove unused parameter in sameSation * use length too * refactor parallelization * use parallel GC * refactor * refactor
*	Down to 14s locally (#583)	Dr Ian Preston	2024-01-25	1	-49/+69
\| \| \| \| \| \|	Use flat array for stats. Use simd for line termination Co-authored-by: Ian Preston <ianopolous@protonmail.com>
*	armandino: minimise hash collisions + other improvements (#585)	Arman Sharif	2024-01-25	1	-50/+46
\|
*	Simplify Node class with less field, improve hash mix speed (#584)	Van Phu DO	2024-01-25	1	-66/+60
\| \| \| \| \| \| \|	* Simplify Node class with less field, improve hash mix speed * remove some ops, a bit faster * more inline, little bit faster but not sure
*	gabrielfoo's first attempt (#556)	gabrielfoo	2024-01-25	1	-0/+180
\| \| \| \| \| \| \| \| \|	* first attempt * formatting fix --------- Co-authored-by: Gabriel <gabriel@gabriel>
*	C style code. Should be ~4secs or lower based on local testing. (#559)	Vemana	2024-01-25	1	-0/+1654
\| \| \| \| \| \|	1. Use Unsafe 2. Fit hashtable in L2 cache. 3. If we can find a good hash function, it can fit in L1 cache even. 4. Improve temperature parsing by using a lookup table
*	Go implementation by AlexanderYastrebov (#298)	Alexander Yastrebov	2024-01-23	5	-0/+452
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Go implementation by AlexanderYastrebov This is a proof-of-concept to demonstrate non-java submission. It requires Docker with BuildKit plugin to build and export binary. Updates * #67 * #253 * Use collision-free id lookup * Use number of buckets greater than max number of keys
*	Laake Scates-Gervasi first submission (justplainlaake) [2.5s execution, ↵	Laake Scates-Gervasi	2024-01-23	1	-0/+459
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	locally similar time to top 5] (#431) * Init Push * organize imports * Add OpenMap * Best outcome yet * Create prepare script and calculate script for native image, also add comments on calculation * Remove extra hashing, and need for the set array * Commit formatting changes from build * Remove unneeded device information * Make shell scripts executable, add hash collision double check for equality * Add hash collision double check for equality * Skip multithreading for small files to improve small file performance
*	Rewrote to always read 16 bytes, this has less instructions on perf. (#562)	Roy van Rijn	2024-01-23	1	-181/+116
\| \| \|	It doesn't make a lot of sense since quite some code can be written shorter, but this is what gives the best numbers.
*	CalculateAverage_3j5a off-the-shelf Java components + ArraysSupport (#566)	3j5a	2024-01-23	1	-0/+277
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* off the shell Java components, curious about official runtime results. thnx my laptop results are around 12 seconds, e.g: 87.66user 1.32system 0:12.11elapsed 734%CPU (0avgtext+0avgdata 13980924maxresident)k Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 39 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 8 On-line CPU(s) list: 0-7 Vendor ID: GenuineIntel Model name: Intel(R) Core(TM) i5-8400H CPU @ 2.50GHz * off-the-shelf Java components... curious about official runtime results. thnx laptop results are around 11 seconds, e.g: ./calculate_average_3j5a.sh 81.46s user 1.36s system 758% cpu 10.917 total Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 39 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 8 On-line CPU(s) list: 0-7 Vendor ID: GenuineIntel Model name: Intel(R) Core(TM) i5-8400H CPU @ 2.50GHz * off-the-shelf Java components + ArraysSupport.. laptop results are around 10.2 seconds, e.g: ./calculate_average_3j5a.sh 75.02s user 1.31s system 750% cpu 10.175 total Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 39 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 8 On-line CPU(s) list: 0-7 Vendor ID: GenuineIntel Model name: Intel(R) Core(TM) i5-8400H CPU @ 2.50GHz * method handle... * full buffer read attempt * MH * MH cleanup
*	1brc contribution from mattiz (first attempt) (#567)	Mathias Bjerke	2024-01-23	1	-0/+324
\| \| \| \| \|	* Contribution from mattiz * Formatted code
*	fine tuning performance further (#526)	karthikeyan97	2024-01-23	1	-55/+46
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* final comit changing using mappedbytebuffer changes before using unsafe address using unsafe * using graalvm,correct unsafe mem implementation --------- Co-authored-by: Karthikeyans <karthikeyan.sn@zohocorp.com>
*	Native image + a few smaller optimisations (#564)	Roman Musin	2024-01-23	1	-133/+114
\| \| \| \| \| \| \| \| \|	* Inline parsing name and station to avoid constantly updating the offset field (-100ms) * Remove Worker class, inline the logic into lambda * Accumulate results in an int matrix instead of using result row (-50ms) * Use native image
*	Add Yourwass take on the challenge (#532)	yourwass	2024-01-23	1	-0/+288
\| \| \| \|	* Uses vector api for city name parsing and for hash index collision resolution * Uses lookup tables for temperature parsing
*	improvements (#521)	Yann Moisan	2024-01-23	1	-7/+10
\| \| \| \| \| \|	- inline computeIfAbsent - replace arraycopy by copyOfRange Co-authored-by: Yann Moisan <yann@zen.ly>
*	Deploy v2 for parkertimmins (#524)	Parker Timmins	2024-01-23	1	-97/+78
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Deploy v2 for parkertimmins Main changes: - fix hash which masked incorrectly - do station equality check in simd - make station array length multiple of 32 - search for newline rather than semicolon * Fix bug - entries were being skipped between batches At the boundary between two batches, the first batch would stop after crossing a limit with a padding of 200 characters applied. The next batch should then start looking for the first full entry after the padding. This padding logic had been removed when starting a batch. For this reason, entries starting in the 200 character padding between batches were skipped.