Vladimir Prus
vprus.bsky.social
Vladimir Prus
@vprus.bsky.social
Data Engineer in Portugal.
Reposted by Vladimir Prus
A bored scribe doodled a ten-eyed letter O in some manuscript in the 15th century. Little did they know they are influencing international character encoding standards some 578 years later…
en.wikipedia.org/wiki/Cyrilli...
May 2, 2025 at 9:48 PM
AWS / EKS / Security Question.

If I have a pod with NET_ADMIN capability, and the default pod network namespace, and it gets compromised, what's the worst it can do?

I am interested in this question specifically, not general advice.
April 24, 2025 at 3:44 PM
In gRPC/Go, setting up weighted load balancing is surprisingly simple.

🔹 On the server side, we need to compute requests per second and application load.

🔹 On the client side, enable weight-based request distribution.

It was not exactly five minutes to figure out, though
March 6, 2025 at 8:59 PM
At my previous job, instant communication was simple

- IRC (Internet Relay Chat) was used
- All messages were deleted after 2 weeks

I still believe that is the best way, and all the modern Slacks with years of history are only good for Slack's valuation.
February 26, 2025 at 5:47 PM
The largest AWS r7i instance type is 48xlarge, with 192 vCPUs, or 96 cores.

It has two Xeon 8488C processors, each with 48 cores.

Each procesor has 4 silicon dies.

Each die has 15 cores.

I assume that 48 cores, and not 60, is the result of binning.

This is a very heterogeneous architecture.
February 14, 2025 at 1:51 PM
What's the easiest way to make GRPC load-balancing consider target load?

The Go client can do weighted round-robin with xDS, but xDS requires Istio, and I'd rather not.

The Go client also supports custom load balancing policies, but that's very DIY.

Does anybody have practical recommendations?
February 6, 2025 at 5:09 PM
Can anyone explain AWS EventBridge to me?

- Many services have triggers, e.g. I can have S3 trigger invoking Lambda.
- There is SQS that I can use any way I like

Surely, if any service could write to SQS, we would not need yet another service?
February 6, 2025 at 11:29 AM
Data engineers, do we have a canonical big data modeling methodology now?

Kimball was it. However, it requires many joins and it's not perfect for big data on S3. The methodology itself might be overkill in most cases.

Do we have anything now beyond "use wide tables" and "scd2 if needed"?
January 31, 2025 at 8:56 AM
I sometimes host system design interviews, and for 95% of candidates, the default database choice is PostgreSQL. The remaining 5% mention MongoDB or Cassandra, but I never heard anybody mention MySQL or MariaDB.

Is this just my bubble, or has PostgreSQL decisively won over MySQL?
January 23, 2025 at 9:02 AM
I needed to access AWS from Kubernetes in another cloud and used IAM Roles Anywhere. In this post, I detail the steps and make some conclusions.

Spoiler: it does the job in easy cases, but for full-blown deployment, you will need to write your own automation.

vladimirprus.com/blog/2025-01...
IAM Roles Anywhere
Notes on external access to AWS
vladimirprus.com
January 14, 2025 at 9:18 AM
Linear regression is a dangerous tool. It can fit any data set, but it has a number of assumptions. If you don't check them, the results might be invalid.

Generally, you have to either draw diagnostic charts for linear regression, or check the confidence intervals for coefficients, or both.
January 13, 2025 at 4:11 PM
Does anybody understand if Gemini LLM model, with stated context size of 1M tokens, really mean one basically load all the context data and don't use any RAG?

There's one benchmark, called "RULER", which claims the effective context size is ">128K", which is still fairly impressive.
January 9, 2025 at 5:04 PM
New toy: Odroid M1S. Quad-core ARM A-55, 8GB of RAM, 64GB eMMC storage, M.2 slot, gigabit ethernet.
December 17, 2024 at 1:40 PM
Blogged: Authenticating users with AWS ALB.

It is generally best not to do your own auth. Now that AWS ALB has built-in OAuth support, you can completely off-load authentication, with your service receiving only requests from known users.

vladimirprus.com/blog/2024-11...
Authenticating users with AWS ALB
Secure your Kubernetes app in AWS using Google user authentication
vladimirprus.com
December 2, 2024 at 11:43 AM
Spark Connect is a new feature of Spark that enables lightweight drivers to use shared execution "cluster".

In this post, my colleague Sergey Kotlov explains when it is useful, how to make it work in practice, and what challenges you might find.

towardsdatascience.com/adopting-spa...
Adopting Spark Connect
How we use a shared Spark server to make our Spark infrastructure more efficient
towardsdatascience.com
November 27, 2024 at 12:15 PM
Hello Blusky! I am a data engineer working on things like Spark infrastructure, A/B tests and anomaly detection.

Previously, I worked on developer tools such as GDB, Eclipse, and KDevelop.

Hopefully, this platform will be a good one for technical content.
November 27, 2024 at 8:57 AM