X-MeshGraphNet: Scalable Multi-Scale Graph Neural Networks for Physics Simulation

By Javier Vásquez

Posted on: November 27, 2024

X-MeshGraphNet: Scalable Multi-Scale Graph Neural Networks for Physics Simulation

Graph Neural Networks (GNNs) have gained significant traction for simulating complex physical systems, with models like MeshGraphNet demonstrating strong performance on unstructured simulation meshes. However, these models face several limitations, including scalability issues, requirement for meshi...

Read More →

DROID-Splat: Combining end-to-end SLAM with 3D Gaussian Splatting

By Javier Vásquez

Posted on: November 27, 2024

DROID-Splat: Combining end-to-end SLAM with 3D Gaussian Splatting

Recent progress in scene synthesis makes standalone SLAM systems purely based on optimizing hyperprimitives with a Rendering objective possible \cite{monogs}. However, the tracking performance still lacks behind traditional \cite{orbslam} and end-to-end SLAM systems \cite{droid}. An optimal trade-of...

Read More →

DINO-X: A Unified Vision Model for Open-World Object Detection and Understanding

By Javier Vásquez

Posted on: November 25, 2024

DINO-X: A Unified Vision Model for Open-World Object Detection and Understanding

In this paper, we introduce DINO-X, which is a unified object-centric vision model developed by IDEA Research with the best open-world object detection performance to date. DINO-X employs the same Transformer-based encoder-decoder architecture as Grounding DINO 1.5 to pursue an object-level represen...

Read More →

MMGenBench: Evaluating the Limits of LMMs from the Text-to-Image Generation Perspective

By Naomi Wilson

Posted on: November 25, 2024

MMGenBench: Evaluating the Limits of LMMs from the Text-to-Image Generation Perspective

Large Multimodal Models (LMMs) have demonstrated remarkable capabilities. While existing benchmarks for evaluating LMMs mainly focus on image comprehension, few works evaluate them from the image generation perspective. To address this issue, we propose a straightforward automated evaluation pipelin...

Read More →

Multimodal Autoregressive Pre-training of Large Vision Encoders

By Naomi Wilson

Posted on: November 22, 2024

Multimodal Autoregressive Pre-training of Large Vision Encoders

We introduce a novel method for pre-training of large-scale vision encoders. Building on recent advancements in autoregressive pre-training of vision models, we extend this framework to a multimodal setting, i.e., images and text. In this paper, we present AIMV2, a family of generalist vision encode...

Read More →

OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs

By Kate Martin

Posted on: November 22, 2024

OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs

Scientific progress depends on researchers' ability to synthesize the growing body of literature. Can large language models (LMs) assist scientists in this task? We introduce OpenScholar, a specialized retrieval-augmented LM that answers scientific queries by identifying relevant passages from 45 mi...

Read More →

SemiKong: Curating, Training, and Evaluating A Semiconductor Industry-Specific Large Language Model

By Javier Vásquez

Posted on: November 22, 2024

SemiKong: Curating, Training, and Evaluating A Semiconductor Industry-Specific Large Language Model

Large Language Models (LLMs) have demonstrated the potential to address some issues within the semiconductor industry. However, they are often general-purpose models that lack the specialized knowledge needed to tackle the unique challenges of this sector, such as the intricate physics and chemistry...

Read More →

Stereo Anything: Unifying Stereo Matching with Large-Scale Mixed Data

By Kate Martin

Posted on: November 22, 2024

Stereo Anything: Unifying Stereo Matching with Large-Scale Mixed Data

Stereo matching has been a pivotal component in 3D vision, aiming to find corresponding points between pairs of stereo images to recover depth information. In this work, we introduce StereoAnything, a highly practical solution for robust stereo matching. Rather than focusing on a specialized model, ...

Read More →