+

Research Posts

Multimodal Autoregressive Pre-training of Large Vision Encoders

Papers with Code
Reporter Naomi Wilson

By Naomi Wilson

Posted on: November 22, 2024

Multimodal Autoregressive Pre-training of Large Vision Encoders

We introduce a novel method for pre-training of large-scale vision encoders. Building on recent advancements in autoregressive pre-training of vision models, we extend this framework to a multimodal setting, i.e., images and text. In this paper, we present AIMV2, a family of generalist vision encode...

Read More

OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs

Papers with Code
Reporter Kate Martin

By Kate Martin

Posted on: November 22, 2024

OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs

Scientific progress depends on researchers' ability to synthesize the growing body of literature. Can large language models (LMs) assist scientists in this task? We introduce OpenScholar, a specialized retrieval-augmented LM that answers scientific queries by identifying relevant passages from 45 mi...

Read More

SemiKong: Curating, Training, and Evaluating A Semiconductor Industry-Specific Large Language Model

Papers with Code
Reporter Javier Vásquez

By Javier Vásquez

Posted on: November 22, 2024

SemiKong: Curating, Training, and Evaluating A Semiconductor Industry-Specific Large Language Model

Large Language Models (LLMs) have demonstrated the potential to address some issues within the semiconductor industry. However, they are often general-purpose models that lack the specialized knowledge needed to tackle the unique challenges of this sector, such as the intricate physics and chemistry...

Read More

Stereo Anything: Unifying Stereo Matching with Large-Scale Mixed Data

Papers with Code
Reporter Kate Martin

By Kate Martin

Posted on: November 22, 2024

Stereo Anything: Unifying Stereo Matching with Large-Scale Mixed Data

Stereo matching has been a pivotal component in 3D vision, aiming to find corresponding points between pairs of stereo images to recover depth information. In this work, we introduce StereoAnything, a highly practical solution for robust stereo matching. Rather than focusing on a specialized model, ...

Read More

FunctionChat-Bench: Comprehensive Evaluation of Language Models' Generative Capabilities in Korean Tool-use Dialogs

Papers with Code
Reporter Javier Vásquez

By Javier Vásquez

Posted on: November 22, 2024

FunctionChat-Bench: Comprehensive Evaluation of Language Models' Generative Capabilities in Korean Tool-use Dialogs

This study investigates language models' generative capabilities in tool-use dialogs. We categorize the models' outputs in tool-use dialogs into four distinct types: Tool Call, Answer Completion, Slot Question, and Relevance Detection, which serve as aspects for evaluation. We introduce FunctionChat...

Read More

SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration

Papers with Code
Reporter Kate Martin

By Kate Martin

Posted on: November 20, 2024

SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration

Although quantization for linear layers has been widely used, its application to accelerate the attention process remains limited. SageAttention utilizes 8-bit matrix multiplication, 16-bit matrix multiplication with 16-bit accumulator, and precision-enhancing methods, implementing an accurate and 2...

Read More

Contourlet Refinement Gate Framework for Thermal Spectrum Distribution Regularized Infrared Image Super-Resolution

Papers with Code
Reporter Kate Martin

By Kate Martin

Posted on: November 20, 2024

Contourlet Refinement Gate Framework for Thermal Spectrum Distribution Regularized Infrared Image Super-Resolution

Image super-resolution (SR) is a classical yet still active low-level vision problem that aims to reconstruct high-resolution (HR) images from their low-resolution (LR) counterparts, serving as a key technique for image enhancement. Current approaches to address SR tasks, such as transformer-based a...

Read More

Motif Channel Opened in a White-Box: Stereo Matching via Motif Correlation Graph

Papers with Code
Reporter Kate Martin

By Kate Martin

Posted on: November 20, 2024

Motif Channel Opened in a White-Box: Stereo Matching via Motif Correlation Graph

Real-world applications of stereo matching, such as autonomous driving, place stringent demands on both safety and accuracy. However, learning-based stereo matching methods inherently suffer from the loss of geometric structures in certain feature channels, creating a bottleneck in achieving precise...

Read More