https://cs.nyu.edu/~shw8119/
They’re getting great scalability up to 8TB on a single node
They’re getting great scalability up to 8TB on a single node
Laxman Dhulipala emphasizes the benefits of single-node, shared memory parallelism
Laxman Dhulipala emphasizes the benefits of single-node, shared memory parallelism
We have a great program with 9 talks!
If you are curious about compilers, type systems, module systems, formal proofs, and typed domain modeling, then this is the place for you
We have a great program with 9 talks!
If you are curious about compilers, type systems, module systems, formal proofs, and typed domain modeling, then this is the place for you
The whole system is essentially just standard unification + subtiming
⬇️
The whole system is essentially just standard unification + subtiming
⬇️
similar to region types, TypeDis associates a task identifier or "timestamp", δ, with every allocation.
e.g. the type string@δ indicates that the string was allocated at time δ
Unboxed types don't need these annotations (e.g. raw integers, booleans, etc)
⬇️
similar to region types, TypeDis associates a task identifier or "timestamp", δ, with every allocation.
e.g. the type string@δ indicates that the string was allocated at time δ
Unboxed types don't need these annotations (e.g. raw integers, booleans, etc)
⬇️
this algorithm seems to be folklore -- the original Boyer-Moore algorithm is sequential, but I've found at least two mentions of the parallel algorithm in the wild: ⬇️
this algorithm seems to be folklore -- the original Boyer-Moore algorithm is sequential, but I've found at least two mentions of the parallel algorithm in the wild: ⬇️
now, almost exactly a year later, I’ve been to ~250 places and counting
love this city… so much to see
now, almost exactly a year later, I’ve been to ~250 places and counting
love this city… so much to see
The loops are identical except for the spilled locations for temps x10 and x11.
On the left, they are non-adjacent (see: two `str` near bottom).
On the right, they happen to be adjacent (see: `stp`)
The loops are identical except for the spilled locations for temps x10 and x11.
On the left, they are non-adjacent (see: two `str` near bottom).
On the right, they happen to be adjacent (see: `stp`)
Here's an equivalent program in MaPLe, and corresponding timings for 1-8 threads.
Out of the box, MaPLe is ~2.5x faster here, pretty consistently across thread counts.
Here's an equivalent program in MaPLe, and corresponding timings for 1-8 threads.
Out of the box, MaPLe is ~2.5x faster here, pretty consistently across thread counts.