Patrice Bechard
banner
patricebechard.bsky.social
Patrice Bechard
@patricebechard.bsky.social
Applied Research Scientist working on LLMs at @ServiceNow. Opinions are my own.
🔍 Extra findings:

• Models struggle most with handwritten & whiteboard sketches
• UI screenshots are easiest
• End-to-end generation beats decomposed pipelines
• Finetuning on diverse sketch data is key to generalization
May 29, 2025 at 3:34 AM
📊 We benchmarked top VLMs (GPT-4o, Claude, Gemini) vs. open-weight models (Qwen, LLaMA, Pixtral).

📈 Finetuned open models outperform proprietary ones:

Qwen2.5-VL-7B → FlowSim: 0.614
GPT-4o → FlowSim: 0.786
𝐐𝐰𝐞𝐧𝟐.𝟓-𝐕𝐋-𝟕𝐁 (𝐟𝐢𝐧𝐞𝐭𝐮𝐧𝐞𝐝) → 𝐅𝐥𝐨𝐰𝐒𝐢𝐦: 𝟎.𝟗𝟓𝟕
May 29, 2025 at 3:34 AM
🧠 We built a large dataset (22K+ samples) of workflow diagrams:

• Synthetic (Graphviz)
• Manual (hand-drawn)
• Whiteboard
• Digital
• UI screenshots

These were paired with structured JSON workflow outputs for training and evaluation.
May 29, 2025 at 3:34 AM
𝐖𝐡𝐲?

Workflow automation is powerful—but authoring flows is still complex, even with low-code tools.
💫𝐒𝐭𝐚𝐫𝐅𝐥𝐨𝐰 explores a simpler interface: 𝐣𝐮𝐬𝐭 𝐝𝐫𝐚𝐰 𝐢𝐭.

Imagine sketching a workflow on a whiteboard and getting a runnable flow in return.
May 29, 2025 at 3:34 AM
🔍 Want to learn more? Look at our paper to learn more on how to:

* Build balanced training datasets for real-world tasks
* Learn how to handle data imbalance
* Get insights on how to design for at-scale deployment

arxiv.org/abs/2501.04652
Multi-task retriever fine-tuning for domain-specific and efficient RAG
Retrieval-Augmented Generation (RAG) has become ubiquitous when deploying Large Language Models (LLMs), as it can address typical limitations such as generating hallucinated or outdated information. H...
arxiv.org
January 9, 2025 at 3:46 PM
🌟 Key Features:

* One retriever for many use cases
* Works across languages! 🌍
* Handles structured data like workflows
* Lightweight & fast for production
* Generalizes to new domains & tasks
January 9, 2025 at 3:46 PM
📊 Our Results:

Multi-task instruction fine-tuning FTW! Our approach beats both BM25 and strong off-the-shelf encoder models across all retrieval tasks (in-distribution and out-of-distribution).
January 9, 2025 at 3:46 PM
💡 The Challenge:

* RAG needs domain-specific knowledge
* Multiple apps = multiple retrievers = 💰
* Different types of data (steps, tables, fields, ...)
January 9, 2025 at 3:46 PM
Ready to learn more? Check out our full paper here: arxiv.org/abs/2412.00239

If this sounds exciting, follow us! We’ve got more papers and insights on the way—don’t miss out! 🚀
Generating a Low-code Complete Workflow via Task Decomposition and RAG
AI technologies are moving rapidly from research to production. With the popularity of Foundation Models (FMs) that generate text, images, and video, AI-based systems are increasing their complexity. ...
arxiv.org
December 3, 2024 at 3:15 PM
Finally, we outline trade-offs and practical considerations, from latency improvements to deployment strategies. If you’re designing GenAI systems, this is a goldmine of insights!
December 3, 2024 at 3:15 PM
Evaluation was key: we developed a novel tree-based metric, Flow Similarity, to assess workflow correctness. Plus, we measured each sub-task and RAG component separately for fine-grained insights.
December 3, 2024 at 3:15 PM
We dive deep into dataset creation, discussing how Task Decomposition guided our labeling efforts. By focusing on smaller tasks, we sped up labeling, reduced costs, and iteratively improved our system.
December 3, 2024 at 3:15 PM
RAG enhances the system by grounding the generation process in real-time data from the environment. This reduces hallucinations and ensures that the generated workflows are accurate and context-aware.
December 3, 2024 at 3:15 PM
Task Decomposition allows us to split the workflow generation into two sub-tasks:

1. Outlining the workflow structure
2. Populating inputs for each step

Each sub-task is easier to solve and test, boosting the system’s modularity and maintainability.
December 3, 2024 at 3:15 PM
We tackle a real-world use case: Workflow Generation. Given a user requirement in natural language, our system generates complex workflows step by step. This involves breaking the problem into smaller, manageable tasks.
December 3, 2024 at 3:15 PM