+

Research Posts

Exploring the Benefit of Activation Sparsity in Pre-training

Papers with Code
Reporter Kate Martin

By Kate Martin

Posted on: October 07, 2024

Exploring the Benefit of Activation Sparsity in Pre-training

Pre-trained Transformers inherently possess the characteristic of sparse activation, where only a small fraction of the neurons are activated for each token. While sparse activation has been explored through post-training methods, its potential in pre-training remains untapped. In this work, we firs...

Read More

Choices are More Important than Efforts: LLM Enables Efficient Multi-Agent Exploration

Papers with Code
Reporter Naomi Wilson

By Naomi Wilson

Posted on: October 04, 2024

Choices are More Important than Efforts: LLM Enables Efficient Multi-Agent Exploration

With expansive state-action spaces, efficient multi-agent exploration remains a longstanding challenge in reinforcement learning. Although pursuing novelty, diversity, or uncertainty attracts increasing attention, redundant efforts brought by exploration without proper guidance choices poses a pract...

Read More

CoT-ST: Enhancing LLM-based Speech Translation with Multimodal Chain-of-Thought

Papers with Code
Reporter Javier Vásquez

By Javier Vásquez

Posted on: October 02, 2024

CoT-ST: Enhancing LLM-based Speech Translation with Multimodal Chain-of-Thought

Speech Language Models (SLMs) have demonstrated impressive performance on speech translation tasks. However, existing research primarily focuses on direct instruction fine-tuning and often overlooks the inherent reasoning capabilities of SLMs. In this paper, we introduce a three-stage training frame...

Read More

Simple and Fast Distillation of Diffusion Models

Papers with Code
Reporter Naomi Wilson

By Naomi Wilson

Posted on: October 02, 2024

Simple and Fast Distillation of Diffusion Models

Diffusion-based generative models have demonstrated their powerful performance across various tasks, but this comes at a cost of the slow sampling speed. To achieve both efficient and high-quality synthesis, various distillation-based accelerated sampling methods have been developed recently. Howeve...

Read More

ManiSkill3: GPU Parallelized Robotics Simulation and Rendering for Generalizable Embodied AI

Papers with Code
Reporter Javier Vásquez

By Javier Vásquez

Posted on: October 02, 2024

ManiSkill3: GPU Parallelized Robotics Simulation and Rendering for Generalizable Embodied AI

Simulation has enabled unprecedented compute-scalable approaches to robot learning. However, many existing simulation frameworks typically support a narrow range of scenes/tasks and lack features critical for scaling generalizable robotics and sim2real. We introduce and open source ManiSkill3, the f...

Read More

Toward Efficient Deep Blind RAW Image Restoration

Papers with Code
Reporter Kate Martin

By Kate Martin

Posted on: September 30, 2024

Toward Efficient Deep Blind RAW Image Restoration

Multiple low-vision tasks such as denoising, deblurring and super-resolution depart from RGB images and further reduce the degradations, improving the quality. However, modeling the degradations in the sRGB domain is complicated because of the Image Signal Processor (ISP) transformations. Despite of...

Read More

YOLOv8-ResCBAM: YOLOv8 Based on An Effective Attention Module for Pediatric Wrist Fracture Detection

Papers with Code
Reporter Javier Vásquez

By Javier Vásquez

Posted on: September 30, 2024

YOLOv8-ResCBAM: YOLOv8 Based on An Effective Attention Module for Pediatric Wrist Fracture Detection

Wrist trauma and even fractures occur frequently in daily life, particularly among children who account for a significant proportion of fracture cases. Before performing surgery, surgeons often request patients to undergo X-ray imaging first, and prepare for the surgery based on the analysis of the ...

Read More

MinerU: An Open-Source Solution for Precise Document Content Extraction

Papers with Code
Reporter Naomi Wilson

By Naomi Wilson

Posted on: September 30, 2024

MinerU: An Open-Source Solution for Precise Document Content Extraction

Document content analysis has been a crucial research area in computer vision. Despite significant advancements in methods such as OCR, layout detection, and formula recognition, existing open-source solutions struggle to consistently deliver high-quality content extraction due to the diversity in d...

Read More