| Commit message (Collapse) | Author | Age | Files | Lines |
| | |
|
| |
|
|
|
| |
* Amendments
* One more locality touchup: no need to carry the entire name array
|
| |
|
|
|
| |
* another shameless copycat from thomas: less safepoints
* I have no idea what I am doing
|
| |
|
| |
Co-authored-by: Ian Preston <ianopolous@protonmail.com>
|
| | |
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
* Read file with multiple virtual threads and process chunks of file data in parallel.
* Updated logic to bucket every chunk of aggs into a vector and merge them into a TreeMap for printing.
* Virtual Thread / File Channels Impl.
* Renamed files with GHUsername.
* Added statement to get vals before updating.
* Added executable permission to the files.
|
| |
|
|
|
|
|
|
|
| |
* Simple multi-threaded version
* Format code
* Formatted code
* More formatting
|
| |
|
|
|
|
|
| |
* less type conversion, less string cast
* adjust some comments
* fixed format issue
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Latest snapshot (#1)
preparing initial version
* Improved performance to 20seconds (-9seconds from the previous version) (#2)
improved performance a bit
* Improved performance to 14 seconds (-6 seconds) (#3)
improved performance to 14 seconds
* sync branches (#4)
* initial commit
* some refactoring of methods
* some fixes for partitioning
* some fixes for partitioning
* fixed hacky getcode for utf8 bytes
* simplified getcode for partitioning
* temp solution with syncing
* temp solution with syncing
* new stream processing
* new stream processing
* some improvements
* cleaned stuff
* run configuration
* round buffer for the stream to pages
* not using compute since it's slower than straightforward get/put. using own byte array equals.
* using parallel gc
* avoid copying bytes when creating a station object
* formatting
* Copy less arrays. Improved performance to 12.7 seconds (-2 seconds) (#5)
* initial commit
* some refactoring of methods
* some fixes for partitioning
* some fixes for partitioning
* fixed hacky getcode for utf8 bytes
* simplified getcode for partitioning
* temp solution with syncing
* temp solution with syncing
* new stream processing
* new stream processing
* some improvements
* cleaned stuff
* run configuration
* round buffer for the stream to pages
* not using compute since it's slower than straightforward get/put. using own byte array equals.
* using parallel gc
* avoid copying bytes when creating a station object
* formatting
* some tuning to increase performance
* some tuning to increase performance
* avoid copying data; fast hashCode with slightly more collisions
* avoid copying data; fast hashCode with slightly more collisions
* cleanup (#6)
* tidy up
|
| |
|
|
|
|
| |
* Some fine tuning.
* Process 2MB segments to make all threads finish at the same time.
Process with 3 scanners in parallel in the same thread.
|
| |
|
|
|
| |
* initial implementation
* few improvements and a cleanup (down to ~12s)
|
| |
|
| |
use more generic hashcode
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Initial submission for jonathan_aotearoa
* Fixing typos
* Adding hyphens to prepare and calculate shell scripts so that they're aligned with my GitHub username.
* Making chunk processing more robust in attempt to fix the cause of the build error.
* Fixing typo.
* Fixed the handling of files less than 8 bytes in length.
* Additional assertion, comment improvements.
* Refactoring to improve testability. Additional assertion and comments.
* Updating collision checking to include checking if the station name is equal.
* Minor refactoring to make param ordering consistent.
* Adding a custom toString method for the results map.
* Fixing collision checking bug
* Fixing rounding bug.
* Fixing collision bug.
---------
Co-authored-by: jonathan <jonathan@example.com>
|
| |
|
|
| |
- use shared memory arena and region between worker threads
- reduce number of instructions slightly while processing file region
|
| |
|
|
|
|
|
|
|
|
| |
* some random changes with minimal, if any, effect
* use munmap() trick
credit: thomaswue
* some smaller tweaks
* use native image
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* CalculateAverage_pdrakatos
* Rename to be valid with rules
* CalculateAverage_pdrakatos
* Rename to be valid with rules
* Changes on scripts execution
* Fixing bugs causing scripts not to be executed
* Changes on prepare make it compatible
* Fixing passing all tests
* Increase direct memory allocation buffer
* Fixing memory problem causes heap space exception
|
| |
|
|
|
|
|
|
|
| |
* Contribution by albertoventurini
* Use byte arrays of size 2^20
---------
Co-authored-by: Alberto Venturini <alberto.venturini@accso.de>
|
| |
|
|
|
|
|
|
|
|
|
| |
* Initial impl
* Fix bad file descriptor error in the `calculate_average_serkan-ozal.sh`
* Disable Epsilon GC and rely on default GC. Because apparently, JIT and Epsilon GC don't play well together in the eval machine for short lived Vector API's `ByteVector` objects
* Take care of byte order before processing key length with bit shift operators
* Fix key equality check for long keys
|
| |
|
|
|
| |
13.8s locally now.
Co-authored-by: Ian Preston <ianopolous@protonmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
/**
* Solution based on thomaswue solution, commit:
* commit d0a28599c293d3afe3291fc3cf169a7b25ae9ae6
* Author: Thomas Wuerthinger <thomas.wuerthinger@oracle.com>
* Date: Sun Jan 21 20:13:48 2024 +0100
*
* Changes:
* 1) Use LinkedBlockingQueue to store partial results, that
* will then be merged into the final map later.
* As different chunks finish at different times, this allows
* to process them as they finish, instead of joining the
* threads sequentially.
* This change seems more useful for the 10k dataset, as the
* runtime difference of each chunk is greater.
* 2) Use only 4 threads if the file is >= 14GB.
* This showed much better results on my local test, but I only
* run with 200 million rows (because of limited RAM), and I have
* no idea how it will perform on the 1brc HW.
*/
|
| |
|
|
|
|
|
| |
custom hashtable (#600)
* melgenek: ~top 15 on 10k. Buffered IO, VarHandles, vectors, custom hashtable
* Calculate the required heap size dynamically
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
* fix test rounding, pass 10K station names
* improved integer conversion, delayed string creation.
* new algorithm hash, use ConcurrentHashMap
* fix rounding test
* added the length of the string in the hash initialization.
* fix hash code collisions
|
| | |
|
| |
|
| |
More small tweaks, perf from 775~ to 738~
|
| |
|
|
|
| |
* Dirty implementation gigiblender
* Final impl gigiblender
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cleanup prepare script
* native image options
* fix quardaric probing (no change to perf)
* mask to get the last chunk of the name
* extract hash functions
* tweak the probing loop (-100ms)
* fiddle with native image options
* Reorder conditions in hope it makes branch predictor happier
* extracted constant
|
| |
|
|
|
|
|
| |
* improve hard disk access locality, another 8%
* add some comments & credit
* fixed format
|
| |
|
| |
Co-authored-by: Quang Hieu Dao <hieu_dq@flinters.vn>
|
| |
|
|
|
| |
* Initial submission
* fixed not executable scripts
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
* Improve hash function
* remove limit on number of cores
* fix calculation of boundaries between chunks
* fix IOOBE
---------
Co-authored-by: Jason Nochlin <hundredwatt@users.noreply.github.com>
|
| |
|
| |
Minor updates here and there, shaves off ~5% of execution time on my machine.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Contribution by albertoventurini
* Shave off a couple of hundreds of milliseconds, by making an assumption on temperature readings
* Parse reading without loop, inspired by other solutions
* Use all cores
* Small improvements, only allocate 247 positions instead of 256
---------
Co-authored-by: Alberto Venturini <alberto.venturini@accso.de>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Update with Rounding Bugfix
* Simplification of Merging Results
* More Plain Java Code for Value Storage
* Improve Performance by Stupid Hash
Drop around 3 seconds on my machine by
simplifying the hash to be ridicules stupid,
but faster.
* Fix outdated comment
|
| |
|
|
|
|
|
|
|
| |
* Dmitry challenge
* Dmitry submit 2.
Use MemorySegment of FileChannle and Unsafe
to read bytes from disk. 4 seconds speedup in local test
from 20s to 16s.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* tonivade improved not using HashMap
* use java 21.0.2
* same hash same station
* remove unused parameter in sameSation
* use length too
* refactor parallelization
* use parallel GC
* refactor
* refactor
|
| |
|
|
|
|
| |
Use flat array for stats.
Use simd for line termination
Co-authored-by: Ian Preston <ianopolous@protonmail.com>
|
| | |
|
| |
|
|
|
|
|
| |
* Simplify Node class with less field, improve hash mix speed
* remove some ops, a bit faster
* more inline, little bit faster but not sure
|
| |
|
|
|
|
|
|
|
| |
* first attempt
* formatting fix
---------
Co-authored-by: Gabriel <gabriel@gabriel>
|
| |
|
|
|
|
| |
1. Use Unsafe
2. Fit hashtable in L2 cache.
3. If we can find a good hash function, it can fit in L1 cache even.
4. Improve temperature parsing by using a lookup table
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Go implementation by AlexanderYastrebov
This is a proof-of-concept to demonstrate non-java submission.
It requires Docker with BuildKit plugin to build and export binary.
Updates
* #67
* #253
* Use collision-free id lookup
* Use number of buckets greater than max number of keys
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
locally similar time to top 5] (#431)
* Init Push
* organize imports
* Add OpenMap
* Best outcome yet
* Create prepare script and calculate script for native image, also add comments on calculation
* Remove extra hashing, and need for the set array
* Commit formatting changes from build
* Remove unneeded device information
* Make shell scripts executable, add hash collision double check for equality
* Add hash collision double check for equality
* Skip multithreading for small files to improve small file performance
|
| |
|
| |
It doesn't make a lot of sense since quite some code can be written shorter, but this is what gives the best numbers.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* off the shell Java components, curious about official runtime results. thnx
my laptop results are around 12 seconds, e.g:
87.66user 1.32system 0:12.11elapsed 734%CPU (0avgtext+0avgdata 13980924maxresident)k
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 39 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Vendor ID: GenuineIntel
Model name: Intel(R) Core(TM) i5-8400H CPU @ 2.50GHz
* off-the-shelf Java components... curious about official runtime results. thnx
laptop results are around 11 seconds, e.g:
./calculate_average_3j5a.sh 81.46s user 1.36s system 758% cpu 10.917 total
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 39 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Vendor ID: GenuineIntel
Model name: Intel(R) Core(TM) i5-8400H CPU @ 2.50GHz
* off-the-shelf Java components + ArraysSupport..
laptop results are around 10.2 seconds, e.g:
./calculate_average_3j5a.sh 75.02s user 1.31s system 750% cpu 10.175 total
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 39 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Vendor ID: GenuineIntel
Model name: Intel(R) Core(TM) i5-8400H CPU @ 2.50GHz
* method handle...
* full buffer read attempt
* MH
* MH cleanup
|
| |
|
|
|
| |
* Contribution from mattiz
* Formatted code
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* final comit
changing using mappedbytebuffer
changes before using unsafe address
using unsafe
* using graalvm,correct unsafe mem implementation
---------
Co-authored-by: Karthikeyans <karthikeyan.sn@zohocorp.com>
|
| |
|
|
|
|
|
|
|
| |
* Inline parsing name and station to avoid constantly updating the offset field (-100ms)
* Remove Worker class, inline the logic into lambda
* Accumulate results in an int matrix instead of using result row (-50ms)
* Use native image
|
| |
|
|
| |
* Uses vector api for city name parsing and for hash index collision resolution
* Uses lookup tables for temperature parsing
|
| |
|
|
|
|
| |
- inline computeIfAbsent
- replace arraycopy by copyOfRange
Co-authored-by: Yann Moisan <yann@zen.ly>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Deploy v2 for parkertimmins
Main changes:
- fix hash which masked incorrectly
- do station equality check in simd
- make station array length multiple of 32
- search for newline rather than semicolon
* Fix bug - entries were being skipped between batches
At the boundary between two batches, the first batch would stop after
crossing a limit with a padding of 200 characters applied. The next
batch should then start looking for the first full entry after the
padding. This padding logic had been removed when starting a batch. For
this reason, entries starting in the 200 character padding between
batches were skipped.
|