Lightnews — Scholar-powered news

Jon Massey

@jonnyboy27.bsky.social

240 followers 290 following 560 posts

Researcher/data scientist/software engineer sort of thing with electronic health data. Also diver, cyclist, ferret keeper

Posts Replies Media Videos

Jon Massey

@jonnyboy27.bsky.social

Im sure you'll give the best of however much time she has left

November 10, 2025 at 6:52 PM

Jon Massey

@jonnyboy27.bsky.social

You deserve the cane for that one

November 10, 2025 at 12:22 PM

Jon Massey

@jonnyboy27.bsky.social

JFC

November 10, 2025 at 10:31 AM

Jon Massey

@jonnyboy27.bsky.social

Ah shame, still looks cool tho

November 8, 2025 at 2:52 PM

Jon Massey

@jonnyboy27.bsky.social

Absolutely Dan. Do those of us who think, for example, the Iraq was was foolhardy, unjust, and illegal think any less of the men and women who put life at limb at risk in its commission? No. That's the point of military service, its value sits apart from moral judgement of those who direct it

November 7, 2025 at 11:19 PM

Jon Massey

@jonnyboy27.bsky.social

What does it look like on a TT? Does it all just fling to the outside?

November 6, 2025 at 5:55 PM

Jon Massey

@jonnyboy27.bsky.social

So yeah, even for quickie one-off prototypey type things of sufficient complexity, good test driven development practices can make your life miles easier in the end. (fin)

November 5, 2025 at 9:07 PM

Jon Massey

@jonnyboy27.bsky.social

But you know what? For this bit of work I ended up writing a bunch of tests because in many cases I could tell the end result was wrong but without lots of stepping through the debugger I couldn't tell where/why, and couldn't tell if my fixes broke something else. 10/n

November 5, 2025 at 9:06 PM

Jon Massey

@jonnyboy27.bsky.social

The other thing is for the past couple of months I've moved into a "research innovation" team where we move fast, make prototypes with low expectations of reliability and initially I thought "yay, I don't have to write as many comprehensive tests/be beholden to coverage reports" 9/n

November 5, 2025 at 9:03 PM

Jon Massey

@jonnyboy27.bsky.social

Painful, but ultimately throwing it all away and starting again armed with the knowledge gained by doing it wrong at least once was the right decision and resulted in code that was far easier to reason about. Not sure I could yet say for sure how to detect when that decision point arrives though 8/n

November 5, 2025 at 9:01 PM

Jon Massey

@jonnyboy27.bsky.social

I rewrote the two main modules of this (scraping/parsing, applying rulesets) completely from scratch at different points in the past week or so. In both cases, the difficulty of trying to write tests, or address test failures made me realise I'd taken a fundamentally wrong approach. 7/n

November 5, 2025 at 8:59 PM

Jon Massey

@jonnyboy27.bsky.social

At several points my colleagues suggested just throwing an LLM at it, but in this instance I just fancied the challenge and because there was no "gold standard* to test against it would be hard to know the correctness of either my or the LLM's code I felt hesitant. A couple of stray reflections: 6/n

November 5, 2025 at 8:56 PM

Jon Massey

@jonnyboy27.bsky.social

The dataset as presented on this site is designed for a human to read and interpret and make decisions based on a set of categories in the dataset and rules in (inline, or in linked mass of pdfs) guidance notes. I needed a full set of all possible categorisations, having applied all these rules 5/n

November 5, 2025 at 8:54 PM

Jon Massey

@jonnyboy27.bsky.social

Through lots of iterations managed to get all of the types of weirdness accounted for, initially trying to make polymorphic parsers but then eventually just making a bunch of transformers to put everything into the most regular form html provided and then parsing that. More pain followed ... 4/n

November 5, 2025 at 8:51 PM

Jon Massey

@jonnyboy27.bsky.social

So decided to "just" scrape their website - probably quite brittle ultimately but only gets updated about once a year so can deal with breakage as and when. 20/22 "chapters" follow broadly regular structure with relatively easy to infer semantics, the rest is just a shit show (hand edited html?) 3/n

November 5, 2025 at 8:48 PM

Jon Massey

@jonnyboy27.bsky.social

Had similar woes trying to get data out of the VMD/DEFRA/EMA during my PhD. In all cases my polite pleas were met with a "computer says no". My boss suggested FOIing it (which we've done before) but that felt a bit nuclear and might just end up with a malicious compliance result. 2/n

November 5, 2025 at 8:46 PM

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news