Multimodal Autoregressive Pre-training of Large Vision Encoders
By Naomi Wilson
Posted on: November 22, 2024
We introduce a novel method for pre-training of large-scale vision encoders. Building on recent advancements in autoregressive pre-training of vision models, we extend this framework to a multimodal setting, i.e., images and text. In this paper, we present AIMV2, a family of generalist vision encode...
OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs
By Kate Martin
Posted on: November 22, 2024
Scientific progress depends on researchers' ability to synthesize the growing body of literature. Can large language models (LMs) assist scientists in this task? We introduce OpenScholar, a specialized retrieval-augmented LM that answers scientific queries by identifying relevant passages from 45 mi...
SemiKong: Curating, Training, and Evaluating A Semiconductor Industry-Specific Large Language Model
By Javier Vásquez
Posted on: November 22, 2024
Large Language Models (LLMs) have demonstrated the potential to address some issues within the semiconductor industry. However, they are often general-purpose models that lack the specialized knowledge needed to tackle the unique challenges of this sector, such as the intricate physics and chemistry...
Stereo Anything: Unifying Stereo Matching with Large-Scale Mixed Data
By Kate Martin
Posted on: November 22, 2024
Stereo matching has been a pivotal component in 3D vision, aiming to find corresponding points between pairs of stereo images to recover depth information. In this work, we introduce StereoAnything, a highly practical solution for robust stereo matching. Rather than focusing on a specialized model, ...
FunctionChat-Bench: Comprehensive Evaluation of Language Models' Generative Capabilities in Korean Tool-use Dialogs
By Javier Vásquez
Posted on: November 22, 2024
This study investigates language models' generative capabilities in tool-use dialogs. We categorize the models' outputs in tool-use dialogs into four distinct types: Tool Call, Answer Completion, Slot Question, and Relevance Detection, which serve as aspects for evaluation. We introduce FunctionChat...
SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration
By Kate Martin
Posted on: November 20, 2024
Although quantization for linear layers has been widely used, its application to accelerate the attention process remains limited. SageAttention utilizes 8-bit matrix multiplication, 16-bit matrix multiplication with 16-bit accumulator, and precision-enhancing methods, implementing an accurate and 2...
Contourlet Refinement Gate Framework for Thermal Spectrum Distribution Regularized Infrared Image Super-Resolution
By Kate Martin
Posted on: November 20, 2024
Image super-resolution (SR) is a classical yet still active low-level vision problem that aims to reconstruct high-resolution (HR) images from their low-resolution (LR) counterparts, serving as a key technique for image enhancement. Current approaches to address SR tasks, such as transformer-based a...
Motif Channel Opened in a White-Box: Stereo Matching via Motif Correlation Graph
By Kate Martin
Posted on: November 20, 2024
Real-world applications of stereo matching, such as autonomous driving, place stringent demands on both safety and accuracy. However, learning-based stereo matching methods inherently suffer from the loss of geometric structures in certain feature channels, creating a bottleneck in achieving precise...