William Xie
wxie.bsky.social
William Xie
@wxie.bsky.social
phd student at cu boulder (williamxie.nyc)
contact rich manipulation
cam showing up wouldve been nice but not sure that wouldve moved the needle
October 24, 2025 at 4:58 AM
unexpected brickage from jokic ruining the third, true, and only MPJ we ever needed
October 24, 2025 at 4:57 AM
and then to self-plug a bit, i wrote up a little case study/position paper on dual-use in VLM reasoning and robot manipulation last month: arxiv.org/abs/2505.18792. the gist is that safeguarding reduces both helpful/harmful robot control and i opine about what that means for future model eval/dev
On the Dual-Use Dilemma in Physical Reasoning and Force
Humans learn how and when to apply forces in the world via a complex physiological and psychological learning process. Attempting to replicate this in vision-language models (VLMs) presents two challe...
arxiv.org
June 4, 2025 at 5:24 AM
seemingly a hot topic rn on my feed but i'm not sure how much more humanity can lower the floor on mass autonomous death, whereas taking care of a human is still wildly inefficient. that is to say on the level of the individual researcher in AI theres a lot more unrealized help we can do than harm
June 4, 2025 at 5:20 AM
i have two somewhat distinct and existential concerns: 1) that we are guileless researchers operating in aggregate as an arm of the MIC and 2) that our research is directly extensible (within a few degrees) to dual-use. tbh i think many overstate 2) and others are helpless wrt 1) due to "incentives"
June 4, 2025 at 5:13 AM
but I think that's a good feeling to lean into
May 12, 2025 at 10:24 PM
yeah, I think you're right on both points. I got in the weeds on haptic teleop interfaces for LFD recently and overall am not super convinced it'll enable the data scale we need. Way more interested in self-improvement from physical interaction (w/ touch) though I feel quite out-of-depth there
May 12, 2025 at 10:24 PM
and w/o touch
May 12, 2025 at 5:52 PM
I'm generally a believer that we'll eventually do everything with vision, but I also believe that we'll need touch to get policies running closer to real-time/humans. My advisor loves to bring up these videos from a study of trying to strike a matchbox w/ and w/o feeling in their fingers:
May 12, 2025 at 5:50 PM
Liebherr
April 22, 2025 at 4:15 PM
going forward, i'm thinking about how we can scale good data collection with force control and improved physical models & reasoning. as of now you cannot convince me that we do not still need huge amounts of real robot data for robust contact-rich manipulation. and we are quite a ways off...
April 19, 2025 at 11:42 PM
so interesting where the field has coalesced and where it has diverged. some of it is a necessary byproduct of manipulation, some of it seems like open areas for research. anyway, here's a fun and unreadable plot: these 25 papers evaluate 64 (59 models) significantly different contact-rich tasks
April 19, 2025 at 11:36 PM
true, but humans learn implicit control laws, however relative they may be, from rich sensory information over many, many episodes. for robots, high-precision servos are just one tool to obtain such high-fidelity data. i also think such tooling is important for achieving supra-human abilities.
March 28, 2025 at 7:01 PM
I think the RL policy / teleop comparison here is not quite fair--the RL policy leverages wrench data, which is the primary supervisory signal for these kinds of insertion tasks (learning visuo-force servoing) whereas the teleop here is using a 3D CAD mouse--huge embodiment gap in data collection
March 23, 2025 at 12:31 AM
adapted a preexisting repo for the DROID dataset+franka robot for the UR5/my small dataset: github.com/badinkajink/...
GitHub - badinkajink/rerun_rlds_ur5
Contribute to badinkajink/rerun_rlds_ur5 development by creating an account on GitHub.
github.com
February 28, 2025 at 7:36 PM
Cool! I remember seeing a complementary approach for learning semantic placement (arxiv.org/abs/2401.07770) -- perhaps it can plug in for segmentation when VLMs cannot reasonably "point" placement regions.
Seeing the Unseen: Visual Common Sense for Semantic Placement
Computer vision tasks typically involve describing what is present in an image (e.g. classification, detection, segmentation, and captioning). We study a visual common sense task that requires underst...
arxiv.org
February 25, 2025 at 1:59 AM