SG-Reg: Generalizable and Efficient Scene Graph Registration

By Kate Martin

Posted on: April 23, 2025

SG-Reg: Generalizable and Efficient Scene Graph Registration

This paper addresses the challenges of registering two rigid semantic scene graphs, an essential capability when an autonomous agent needs to register its map against a remote agent, or against a prior map. The hand-crafted descriptors in classical semantic-aided registration, or the ground-truth an...

Read More →

UFO2: The Desktop AgentOS

By Naomi Wilson

Posted on: April 23, 2025

Recent Computer-Using Agents (CUAs), powered by multimodal large language models (LLMs), offer a promising direction for automating complex desktop workflows through natural language. However, most existing CUAs remain conceptual prototypes, hindered by shallow OS integration, fragile screenshot-bas...

Read More →

Uni3C: Unifying Precisely 3D-Enhanced Camera and Human Motion Controls for Video Generation

By Javier Vásquez

Posted on: April 23, 2025

Uni3C: Unifying Precisely 3D-Enhanced Camera and Human Motion Controls for Video Generation

Camera and human motion controls have been extensively studied for video generation, but existing approaches typically address them separately, suffering from limited data with high-quality annotations for both aspects. To overcome this, we present Uni3C, a unified 3D-enhanced framework for precise ...

Read More →

Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning

By Naomi Wilson

Posted on: April 23, 2025

Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning

Process reward models (PRMs) have proven effective for test-time scaling of Large Language Models (LLMs) on challenging reasoning tasks. However, reward hacking issues with PRMs limit their successful application in reinforcement fine-tuning. In this paper, we identify the main cause of PRM-induced ...

Read More →

EasyEdit2: An Easy-to-use Steering Framework for Editing Large Language Models

By Kate Martin

Posted on: April 23, 2025

EasyEdit2: An Easy-to-use Steering Framework for Editing Large Language Models

In this paper, we introduce EasyEdit2, a framework designed to enable plug-and-play adjustability for controlling Large Language Model (LLM) behaviors. EasyEdit2 supports a wide range of test-time interventions, including safety, sentiment, personality, reasoning patterns, factuality, and language f...

Read More →

FlowReasoner: Reinforcing Query-Level Meta-Agents

By Naomi Wilson

Posted on: April 23, 2025

FlowReasoner: Reinforcing Query-Level Meta-Agents

This paper proposes a query-level meta-agent named FlowReasoner to automate the design of query-level multi-agent systems, i.e., one system per user query. Our core idea is to incentivize a reasoning-based meta-agent via external execution feedback. Concretely, by distilling DeepSeek R1, we first en...

Read More →

CaPa: Carve-n-Paint Synthesis for Efficient 4K Textured Mesh Generation

By Kate Martin

Posted on: January 20, 2025

CaPa: Carve-n-Paint Synthesis for Efficient 4K Textured Mesh Generation

The synthesis of high-quality 3D assets from textual or visual inputs has become a central objective in modern generative modeling. Despite the proliferation of 3D generation algorithms, they frequently grapple with challenges such as multi-view inconsistency, slow generation times, low fidelity, an...

Read More →

FiLo++: Zero-/Few-Shot Anomaly Detection by Fused Fine-Grained Descriptions and Deformable Localization

By Javier Vásquez

Posted on: January 20, 2025

FiLo++: Zero-/Few-Shot Anomaly Detection by Fused Fine-Grained Descriptions and Deformable Localization

Anomaly detection methods typically require extensive normal samples from the target class for training, limiting their applicability in scenarios that require rapid adaptation, such as cold start. Zero-shot and few-shot anomaly detection do not require labeled samples from the target class in advan...

Read More →