BetaSciGuy
betasci.bsky.social
BetaSciGuy
@betasci.bsky.social
Dad. Data enthusiast. Former academic, current data scientist/engineer. #dataengineering #datescience #dataviz
Amazon AWS experiences a major outage, impacting countless services!

...and yet...📈
October 21, 2025 at 12:48 PM
Hey DOGE! I found evidence of government wasting time and tax payer dollars on something the people don't want!
www.congress.gov/bill/119th-c...
February 12, 2025 at 3:21 AM
Fun fact: @npr.org and FCC headquarters are directly across the street from one another
January 31, 2025 at 1:32 AM
This new government devolved into a meme in less than one day
January 22, 2025 at 12:36 AM
GOD, I love Raygun (the mid-west clothing store, not the Australian break dancer, although I have nothing against her)
www.raygunsite.com/collections/...
December 7, 2024 at 3:24 PM
The 'side of existential dread' got me 🤣
December 5, 2024 at 4:10 PM
Small detour experimenting with including lineage with charts and dashboards in Apache Superset. Took a lot of testing, but I figured out the specific necessary configuration details eventually.
#databs #DataHub #ApacheSuperset
December 4, 2024 at 5:42 AM
Returned to testing #DataHub and figuring out column-level lineage. Turns out, the issue was how #DBeaver sends the queries to Redshift and the query format being something that #DataHub can't automatically parse. I went to AWS's query editor (v2), ran the same queries and now it works!
#databs
December 3, 2024 at 5:04 PM
Duolingo got real aggressive with its end of year wrap up! Looks like Duo has some kind of secret dark side that I should be scared of...
December 2, 2024 at 4:58 PM
Tonight I went on an adventure of trying to install OpenMetadata to test its data catalog functionality so I could compare it to DataHub. Most of the features seem to be the same, but OpenMetadata appears to struggle to automatically detect the table lineage in my Redshift database.
November 30, 2024 at 5:44 AM
OK, I can create derived tables in my redshift database and DataHub will detect the table-level lineage of those tables, but it's not giving me the column level lineage. I may need to dig a little deeper into this to figure out why this is... but not tonight! I've done enough today.
#databs
November 29, 2024 at 2:31 AM
I realized I'm not very good with map data, especially when it comes to plotting with generic shapefiles, so I'm trying to work on that. Here's a quick map I did using a DC ward shapefile that shows the change in population from 2010->2020.
November 9, 2024 at 4:00 AM
My emotional state today in the form of a Ghibli movie poster:
November 6, 2024 at 9:28 PM
📊 Another interesting insight is to look at what day of the week people are joining. Looks like most people join near the end of the week (Thursday/Friday), but fewer join on the weekends. 6/
September 18, 2024 at 3:26 AM
Noticing a lot of people posting their user numbers and when they joined, so I thought it would be interesting to see what this information reveals about the history of BlueSky.

Two events stand out ... 1/
September 18, 2024 at 2:49 AM
Cool!
September 17, 2024 at 12:23 PM
Average VantageScore credit score by state in 2024. 📊

Source: "What Is the Average Credit Score by State?", equifax.com (2024)
July 30, 2024 at 2:41 AM
But if you make one of those columns a categorical variable, it will result in a table with every single possible combination of that categorical column and the other columns in the groupby:
June 6, 2024 at 2:41 AM
To clarify, Pandas.DataFrame.groupby() does exactly what you'd expect it to do if you're dealing with columns that are strings or numbers:
June 6, 2024 at 2:38 AM