+

Research Posts

ManiSkill3: GPU Parallelized Robotics Simulation and Rendering for Generalizable Embodied AI

Papers with Code
Reporter Javier Vásquez

By Javier Vásquez

Posted on: October 02, 2024

ManiSkill3: GPU Parallelized Robotics Simulation and Rendering for Generalizable Embodied AI

Simulation has enabled unprecedented compute-scalable approaches to robot learning. However, many existing simulation frameworks typically support a narrow range of scenes/tasks and lack features critical for scaling generalizable robotics and sim2real. We introduce and open source ManiSkill3, the f...

Read More

Toward Efficient Deep Blind RAW Image Restoration

Papers with Code
Reporter Kate Martin

By Kate Martin

Posted on: September 30, 2024

Toward Efficient Deep Blind RAW Image Restoration

Multiple low-vision tasks such as denoising, deblurring and super-resolution depart from RGB images and further reduce the degradations, improving the quality. However, modeling the degradations in the sRGB domain is complicated because of the Image Signal Processor (ISP) transformations. Despite of...

Read More

YOLOv8-ResCBAM: YOLOv8 Based on An Effective Attention Module for Pediatric Wrist Fracture Detection

Papers with Code
Reporter Javier Vásquez

By Javier Vásquez

Posted on: September 30, 2024

YOLOv8-ResCBAM: YOLOv8 Based on An Effective Attention Module for Pediatric Wrist Fracture Detection

Wrist trauma and even fractures occur frequently in daily life, particularly among children who account for a significant proportion of fracture cases. Before performing surgery, surgeons often request patients to undergo X-ray imaging first, and prepare for the surgery based on the analysis of the ...

Read More

MinerU: An Open-Source Solution for Precise Document Content Extraction

Papers with Code
Reporter Naomi Wilson

By Naomi Wilson

Posted on: September 30, 2024

MinerU: An Open-Source Solution for Precise Document Content Extraction

Document content analysis has been a crucial research area in computer vision. Despite significant advancements in methods such as OCR, layout detection, and formula recognition, existing open-source solutions struggle to consistently deliver high-quality content extraction due to the diversity in d...

Read More

MCUBench: A Benchmark of Tiny Object Detectors on MCUs

Papers with Code
Reporter Javier Vásquez

By Javier Vásquez

Posted on: September 30, 2024

MCUBench: A Benchmark of Tiny Object Detectors on MCUs

We introduce MCUBench, a benchmark featuring over 100 YOLO-based object detection models evaluated on the VOC dataset across seven different MCUs. This benchmark provides detailed data on average precision, latency, RAM, and Flash usage for various input resolutions and YOLO-based one-stage detector...

Read More

StoryMaker: Towards Holistic Consistent Characters in Text-to-image Generation

Papers with Code
Reporter Naomi Wilson

By Naomi Wilson

Posted on: September 25, 2024

StoryMaker: Towards Holistic Consistent Characters in Text-to-image Generation

Tuning-free personalized image generation methods have achieved significant success in maintaining facial consistency, i.e., identities, even with multiple characters. However, the lack of holistic consistency in scenes with multiple characters hampers these methods' ability to create a cohesive nar...

Read More

3DGS-LM: Faster Gaussian-Splatting Optimization with Levenberg-Marquardt

Papers with Code
Reporter Naomi Wilson

By Naomi Wilson

Posted on: September 25, 2024

3DGS-LM: Faster Gaussian-Splatting Optimization with Levenberg-Marquardt

We present 3DGS-LM, a new method that accelerates the reconstruction of 3D Gaussian Splatting (3DGS) by replacing its ADAM optimizer with a tailored Levenberg-Marquardt (LM). Existing methods reduce the optimization time by decreasing the number of Gaussians or by improving the implementation of the...

Read More

Training Language Models to Self-Correct via Reinforcement Learning

Papers with Code
Reporter Kate Martin

By Kate Martin

Posted on: September 25, 2024

Training Language Models to Self-Correct via Reinforcement Learning

Self-correction is a highly desirable capability of large language models (LLMs), yet it has consistently been found to be largely ineffective in modern LLMs. Existing approaches for training self-correction either require multiple models or rely on a more capable model or other forms of supervision...

Read More