Andrus Adamchik
banner
andrus.adamchik.org
Andrus Adamchik
@andrus.adamchik.org
Java open-source & ObjectStyle

Making Java a viable option for data processing / analysis at DFLib.org

Apache Software Foundation member / ex VP
DFLib 2, milestone 4 is out. Massive QL improvements. I finally switched all of my own projects to QL from the API-based expressions. Also some new ECharts stuff github.com/dflib/dflib/...
September 13, 2025 at 6:28 PM
The first milestone of DFLib 2 is out. Lots of improvements as described here: github.com/dflib/dflib/... A few examples:
* 8x reduction of memory footprint of BooleanSeries
* Major new capabilities in "window" expressions
* Shortened aggregation syntax of "group" expressions
* Heatmap charts
February 12, 2025 at 5:36 PM
So I am on Amtrak on the Christmas Eve. Time to do benchmarking! Result is from the DFLib benchmark suite, normalized vs #java 17 (21 - blue, 23 - green). Some (vectorized) operations are up to 50% faster on 23. May not matter in a typical app, but array processing def. got much faster in 23.
December 24, 2024 at 11:35 PM
Using Bootique test framework even for libraries that are otherwise unaware of Bootique. E.g. standing up a test web server is as simple as this (server response logic is in TestServlet)
December 24, 2024 at 3:36 PM
Somehow this is my first time in the National Air and Space Museum. Feeling excited like a teenager
December 23, 2024 at 8:38 PM
Figured out the discrepancy. Loaded data from GitHub month-by-month instead of the entire year. And the new heatmap correlates really well with the green one. The only difference is GitHub also tracks issues as activity.
December 8, 2024 at 2:13 AM
Just added heatmap charts to DFLib. Now trying to recreate GitHub activity tiles with it. Looks pretty, but the data that I got from GitHub API seems to be missing lots of commits, so the yellow/red heatmap has more holes in it.
December 8, 2024 at 1:30 AM
Still seems to be a recurring topic: "[#dataengineering] become config more than anything else."

My own thoughts on this as a programmer is now is the best time to shift parts of data engineering "to the left" with infrastructure-free pipelines developed next to the app code
November 30, 2024 at 4:14 PM
Me: working to bring data engineering techniques to programmers

A common post on /r/dataengineering: "I am stuck at my job, I want to be a programmer"
😐
November 27, 2024 at 3:19 PM
#Apache ECharts is a really nice lib. We recently built a #Java DSL for it. The docs are not as good as ECharts' own yet 🙂 (dflib.org/dflib/docs/1...), but visualizations possibilities are amazing:

* dflib.org/charts/hocke...
* dflib.org/charts/hocke...
November 27, 2024 at 1:46 PM
Finally, a laptop with more than 16g of RAM that can process decent-size datasets. Of course, right now it is all dedicated to “mdworker_shared”
November 25, 2024 at 7:35 PM
A rule of system evolution - each data processing technology inevitably ends up adding a dashboard feature 😀 DFLib is no exception. Just added a way to save semi-interactive charts to HTML.
November 18, 2024 at 2:25 PM