TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization
By Naomi Wilson
Posted on: January 01, 2025
We introduce TangoFlux, an efficient Text-to-Audio (TTA) generative model with 515M parameters, capable of generating up to 30 seconds of 44.1kHz audio in just 3.7 seconds on a single A40 GPU. A key challenge in aligning TTA models lies in the difficulty of creating preference pairs, as TTA lacks st...
Training Software Engineering Agents and Verifiers with SWE-Gym
By Javier Vásquez
Posted on: January 01, 2025
We present SWE-Gym, the first environment for training real-world software engineering (SWE) agents. SWE-Gym contains 2,438 real-world Python task instances, each comprising a codebase with an executable runtime environment, unit tests, and a task specified in natural language. We use SWE-Gym to tra...
SoftPatch+: Fully Unsupervised Anomaly Classification and Segmentation
By Naomi Wilson
Posted on: January 01, 2025
Although mainstream unsupervised anomaly detection (AD) (including image-level classification and pixel-level segmentation)algorithms perform well in academic datasets, their performance is limited in practical application due to the ideal experimental setting of clean training data. Training with n...
MinsStudio: A Streamlined Package for Minecraft AI Agent Development
By Javier Vásquez
Posted on: December 25, 2024
Minecraft has emerged as a valuable testbed for embodied intelligence and sequential decision-making research, yet the development and validation of novel agents remains hindered by significant engineering challenges. This paper presents MineStudio, an open-source software package designed to stream...
The Thousand Brains Project: A New Paradigm for Sensorimotor Intelligence
By Javier Vásquez
Posted on: December 25, 2024
Artificial intelligence has advanced rapidly in the last decade, driven primarily by progress in the scale of deep-learning systems. Despite these advances, the creation of intelligent systems that can operate effectively in diverse, real-world environments remains a significant challenge. In this w...
Semi-supervised Credit Card Fraud Detection via Attribute-Driven Graph Representation
By Javier Vásquez
Posted on: December 25, 2024
Credit card fraud incurs a considerable cost for both cardholders and issuing banks. Contemporary methods apply machine learning-based classifiers to detect fraudulent behavior from labeled transaction records. But labeled data are usually a small proportion of billions of real transactions due to e...
Efficient MedSAMs: Segment Anything in Medical Images on Laptop
By Naomi Wilson
Posted on: December 23, 2024
Promptable segmentation foundation models have emerged as a transformative approach to addressing the diverse needs in medical images, but most existing models require expensive computing, posing a big barrier to their adoption in clinical practice. In this work, we organized the first international...
WebLLM: A High-Performance In-Browser LLM Inference Engine
By Javier Vásquez
Posted on: December 23, 2024
Advancements in large language models (LLMs) have unlocked remarkable capabilities. While deploying these models typically requires server-grade GPUs and cloud-based inference, the recent emergence of smaller open-source models and increasingly powerful consumer devices have made on-device deploymen...