+

Research Posts

TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization

Papers with Code
Reporter Naomi Wilson

By Naomi Wilson

Posted on: January 01, 2025

TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization

We introduce TangoFlux, an efficient Text-to-Audio (TTA) generative model with 515M parameters, capable of generating up to 30 seconds of 44.1kHz audio in just 3.7 seconds on a single A40 GPU. A key challenge in aligning TTA models lies in the difficulty of creating preference pairs, as TTA lacks st...

Read More

Training Software Engineering Agents and Verifiers with SWE-Gym

Papers with Code
Reporter Javier Vásquez

By Javier Vásquez

Posted on: January 01, 2025

Training Software Engineering Agents and Verifiers with SWE-Gym

We present SWE-Gym, the first environment for training real-world software engineering (SWE) agents. SWE-Gym contains 2,438 real-world Python task instances, each comprising a codebase with an executable runtime environment, unit tests, and a task specified in natural language. We use SWE-Gym to tra...

Read More

SoftPatch+: Fully Unsupervised Anomaly Classification and Segmentation

Papers with Code
Reporter Naomi Wilson

By Naomi Wilson

Posted on: January 01, 2025

SoftPatch+: Fully Unsupervised Anomaly Classification and Segmentation

Although mainstream unsupervised anomaly detection (AD) (including image-level classification and pixel-level segmentation)algorithms perform well in academic datasets, their performance is limited in practical application due to the ideal experimental setting of clean training data. Training with n...

Read More

MinsStudio: A Streamlined Package for Minecraft AI Agent Development

Papers with Code
Reporter Javier Vásquez

By Javier Vásquez

Posted on: December 25, 2024

MinsStudio: A Streamlined Package for Minecraft AI Agent Development

Minecraft has emerged as a valuable testbed for embodied intelligence and sequential decision-making research, yet the development and validation of novel agents remains hindered by significant engineering challenges. This paper presents MineStudio, an open-source software package designed to stream...

Read More

The Thousand Brains Project: A New Paradigm for Sensorimotor Intelligence

Papers with Code
Reporter Javier Vásquez

By Javier Vásquez

Posted on: December 25, 2024

The Thousand Brains Project: A New Paradigm for Sensorimotor Intelligence

Artificial intelligence has advanced rapidly in the last decade, driven primarily by progress in the scale of deep-learning systems. Despite these advances, the creation of intelligent systems that can operate effectively in diverse, real-world environments remains a significant challenge. In this w...

Read More

Semi-supervised Credit Card Fraud Detection via Attribute-Driven Graph Representation

Papers with Code
Reporter Javier Vásquez

By Javier Vásquez

Posted on: December 25, 2024

Semi-supervised Credit Card Fraud Detection via Attribute-Driven Graph Representation

Credit card fraud incurs a considerable cost for both cardholders and issuing banks. Contemporary methods apply machine learning-based classifiers to detect fraudulent behavior from labeled transaction records. But labeled data are usually a small proportion of billions of real transactions due to e...

Read More

Efficient MedSAMs: Segment Anything in Medical Images on Laptop

Papers with Code
Reporter Naomi Wilson

By Naomi Wilson

Posted on: December 23, 2024

Efficient MedSAMs: Segment Anything in Medical Images on Laptop

Promptable segmentation foundation models have emerged as a transformative approach to addressing the diverse needs in medical images, but most existing models require expensive computing, posing a big barrier to their adoption in clinical practice. In this work, we organized the first international...

Read More

WebLLM: A High-Performance In-Browser LLM Inference Engine

Papers with Code
Reporter Javier Vásquez

By Javier Vásquez

Posted on: December 23, 2024

WebLLM: A High-Performance In-Browser LLM Inference Engine

Advancements in large language models (LLMs) have unlocked remarkable capabilities. While deploying these models typically requires server-grade GPUs and cloud-based inference, the recent emergence of smaller open-source models and increasingly powerful consumer devices have made on-device deploymen...

Read More