Not sure if the result would be what you're looking for, but many cluster algorithms accept precalculated (sparse) distance/similarity matrices as input, such as hdbscan. Maybe worth trying? hdbscan.readthedocs.io/en/0.8.6/api...
Not sure if the result would be what you're looking for, but many cluster algorithms accept precalculated (sparse) distance/similarity matrices as input, such as hdbscan. Maybe worth trying? hdbscan.readthedocs.io/en/0.8.6/api...
OK, I thought this was pretty clear, but to spell it out: we've gone from gatekeepers that, for all their blind spots, operate in public & are subject to some accountability... to algorithmic gatekeepers that are opaque, unaccountable, & designed by a tiny, blinkered class of dudebro dipshits.
January 14, 2025 at 7:45 PM
OK, I thought this was pretty clear, but to spell it out: we've gone from gatekeepers that, for all their blind spots, operate in public & are subject to some accountability... to algorithmic gatekeepers that are opaque, unaccountable, & designed by a tiny, blinkered class of dudebro dipshits.
Yes, I always train on the whole dataset after evaluating on the test set (or x-validating). In my case there is also a lot of shift. Of course you then don't really know if your estimated performance will be representative of future data. Guess it depends if the shifts continue or not?
January 10, 2025 at 6:03 PM
Yes, I always train on the whole dataset after evaluating on the test set (or x-validating). In my case there is also a lot of shift. Of course you then don't really know if your estimated performance will be representative of future data. Guess it depends if the shifts continue or not?
Btw, you may be interested in this churn paper citation network: bsky.app/profile/btho...#AppliedDS. I'm planning to create better and more of those in different areas. Interesting the time-slice paper doesn't seem to be in it. Perhaps it's not indexed by OpenAlex.
I've made this citation network of ~700 papers mentioning "churn" and "customers". I've also included the top 100 papers citing or cited by those papers: dev-embeds.graphext.com/5a5ca9660dab.... You can search and filter in any of the 54 variables related to each paper #AppliedDS
January 9, 2025 at 10:17 AM
Btw, you may be interested in this churn paper citation network: bsky.app/profile/btho...#AppliedDS. I'm planning to create better and more of those in different areas. Interesting the time-slice paper doesn't seem to be in it. Perhaps it's not indexed by OpenAlex.
I did try adding some temporal indicators, in case seasonality mattered, but it didn't. My intuition is that the negative impact of not being able to use the most recent data for training trumped all other potentially interesting ways to use more information.
January 9, 2025 at 10:14 AM
I did try adding some temporal indicators, in case seasonality mattered, but it didn't. My intuition is that the negative impact of not being able to use the most recent data for training trumped all other potentially interesting ways to use more information.
With some additional details of avoiding overlap between train and test set etc. The main problem I found, was that if the churn window is e.g. 3 months, you cannot use the last 3 months for training (because it needs to be the test set). If the data isn't very stable, this can have a great impact
January 9, 2025 at 10:11 AM
With some additional details of avoiding overlap between train and test set etc. The main problem I found, was that if the churn window is e.g. 3 months, you cannot use the last 3 months for training (because it needs to be the test set). If the data isn't very stable, this can have a great impact
All hahaha. This data was monthly over a couple of years. So for each month I created features from previous data for those customers still active that month, and targets (churn or not) from the following months. So each person will have one sample for each month that they were active
January 9, 2025 at 10:08 AM
All hahaha. This data was monthly over a couple of years. So for each month I created features from previous data for those customers still active that month, and targets (churn or not) from the following months. So each person will have one sample for each month that they were active