Lightnews — Scholar-powered news

Patrice Bechard

@patricebechard.bsky.social

From notebook to workflow—just by sketching.
That’s the vision.

🔗 arxiv.org/abs/2503.21889
📝 tinyurl.com/3utdbn97

Thanks to @joanrod.bsky.social, @perouz.bsky.social, @spandanagella.bsky.social and all co-authors!
#AI #VLM #WorkflowAutomation #Sketch2Flow #arXiv

StarFlow: Generating Structured Workflow Outputs From Sketch Images

Workflows are a fundamental component of automation in enterprise platforms, enabling the orchestration of tasks, data processing, and system integrations. Despite being widely used, building workflow...

arxiv.org

May 29, 2025 at 3:34 AM

Patrice Bechard

@patricebechard.bsky.social

🔍 Extra findings:

• Models struggle most with handwritten & whiteboard sketches
• UI screenshots are easiest
• End-to-end generation beats decomposed pipelines
• Finetuning on diverse sketch data is key to generalization

May 29, 2025 at 3:34 AM

Patrice Bechard

@patricebechard.bsky.social

📊 We benchmarked top VLMs (GPT-4o, Claude, Gemini) vs. open-weight models (Qwen, LLaMA, Pixtral).

📈 Finetuned open models outperform proprietary ones:

Qwen2.5-VL-7B → FlowSim: 0.614
GPT-4o → FlowSim: 0.786
𝐐𝐰𝐞𝐧𝟐.𝟓-𝐕𝐋-𝟕𝐁 (𝐟𝐢𝐧𝐞𝐭𝐮𝐧𝐞𝐝) → 𝐅𝐥𝐨𝐰𝐒𝐢𝐦: 𝟎.𝟗𝟓𝟕

May 29, 2025 at 3:34 AM

Patrice Bechard

@patricebechard.bsky.social

🧠 We built a large dataset (22K+ samples) of workflow diagrams:

• Synthetic (Graphviz)
• Manual (hand-drawn)
• Whiteboard
• Digital
• UI screenshots

These were paired with structured JSON workflow outputs for training and evaluation.

May 29, 2025 at 3:34 AM

Patrice Bechard

@patricebechard.bsky.social

𝐖𝐡𝐲?

Workflow automation is powerful—but authoring flows is still complex, even with low-code tools.
💫𝐒𝐭𝐚𝐫𝐅𝐥𝐨𝐰 explores a simpler interface: 𝐣𝐮𝐬𝐭 𝐝𝐫𝐚𝐰 𝐢𝐭.

Imagine sketching a workflow on a whiteboard and getting a runnable flow in return.

May 29, 2025 at 3:34 AM

Patrice Bechard

@patricebechard.bsky.social

🔍 Want to learn more? Look at our paper to learn more on how to:

* Build balanced training datasets for real-world tasks
* Learn how to handle data imbalance
* Get insights on how to design for at-scale deployment

arxiv.org/abs/2501.04652

Multi-task retriever fine-tuning for domain-specific and efficient RAG

Retrieval-Augmented Generation (RAG) has become ubiquitous when deploying Large Language Models (LLMs), as it can address typical limitations such as generating hallucinated or outdated information. H...

arxiv.org

January 9, 2025 at 3:46 PM

Patrice Bechard

@patricebechard.bsky.social

🌟 Key Features:

* One retriever for many use cases
* Works across languages! 🌍
* Handles structured data like workflows
* Lightweight & fast for production
* Generalizes to new domains & tasks

January 9, 2025 at 3:46 PM

Patrice Bechard

@patricebechard.bsky.social

📊 Our Results:

Multi-task instruction fine-tuning FTW! Our approach beats both BM25 and strong off-the-shelf encoder models across all retrieval tasks (in-distribution and out-of-distribution).

January 9, 2025 at 3:46 PM

Patrice Bechard

@patricebechard.bsky.social

💡 The Challenge:

* RAG needs domain-specific knowledge
* Multiple apps = multiple retrievers = 💰
* Different types of data (steps, tables, fields, ...)

January 9, 2025 at 3:46 PM

Patrice Bechard

@patricebechard.bsky.social

Ready to learn more? Check out our full paper here: arxiv.org/abs/2412.00239

If this sounds exciting, follow us! We’ve got more papers and insights on the way—don’t miss out! 🚀

Generating a Low-code Complete Workflow via Task Decomposition and RAG

AI technologies are moving rapidly from research to production. With the popularity of Foundation Models (FMs) that generate text, images, and video, AI-based systems are increasing their complexity. ...

arxiv.org

December 3, 2024 at 3:15 PM

Patrice Bechard

@patricebechard.bsky.social

Finally, we outline trade-offs and practical considerations, from latency improvements to deployment strategies. If you’re designing GenAI systems, this is a goldmine of insights!

December 3, 2024 at 3:15 PM

Patrice Bechard

@patricebechard.bsky.social

Evaluation was key: we developed a novel tree-based metric, Flow Similarity, to assess workflow correctness. Plus, we measured each sub-task and RAG component separately for fine-grained insights.

December 3, 2024 at 3:15 PM

Patrice Bechard

@patricebechard.bsky.social

We dive deep into dataset creation, discussing how Task Decomposition guided our labeling efforts. By focusing on smaller tasks, we sped up labeling, reduced costs, and iteratively improved our system.

December 3, 2024 at 3:15 PM

Patrice Bechard

@patricebechard.bsky.social

RAG enhances the system by grounding the generation process in real-time data from the environment. This reduces hallucinations and ensures that the generated workflows are accurate and context-aware.

December 3, 2024 at 3:15 PM

Patrice Bechard

@patricebechard.bsky.social

Task Decomposition allows us to split the workflow generation into two sub-tasks:

1. Outlining the workflow structure
2. Populating inputs for each step

Each sub-task is easier to solve and test, boosting the system’s modularity and maintainability.

December 3, 2024 at 3:15 PM

Patrice Bechard

@patricebechard.bsky.social

We tackle a real-world use case: Workflow Generation. Given a user requirement in natural language, our system generates complex workflows step by step. This involves breaking the problem into smaller, manageable tasks.

December 3, 2024 at 3:15 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news