+

Research Posts

Align Anything: Training All-Modality Models to Follow Instructions with Language Feedback

Papers with Code
Reporter Javier Vásquez

By Javier Vásquez

Posted on: December 23, 2024

Align Anything: Training All-Modality Models to Follow Instructions with Language Feedback

Reinforcement learning from human feedback (RLHF) has proven effective in enhancing the instruction-following capabilities of large language models; however, it remains underexplored in the cross-modality domain. As the number of modalities increases, aligning all-modality models with human intentio...

Read More

Causal Diffusion Transformers for Generative Modeling

Papers with Code
Reporter Naomi Wilson

By Naomi Wilson

Posted on: December 18, 2024

Causal Diffusion Transformers for Generative Modeling

We introduce Causal Diffusion as the autoregressive (AR) counterpart of Diffusion models. It is a next-token(s) forecasting framework that is friendly to both discrete and continuous modalities and compatible with existing next-token prediction models like LLaMA and GPT. While recent works attempt t...

Read More

AIR-Bench: Automated Heterogeneous Information Retrieval Benchmark

Papers with Code
Reporter Javier Vásquez

By Javier Vásquez

Posted on: December 18, 2024

AIR-Bench: Automated Heterogeneous Information Retrieval Benchmark

Evaluation plays a crucial role in the advancement of information retrieval (IR) models. However, current benchmarks, which are based on predefined domains and human-labeled data, face limitations in addressing evaluation needs for emerging domains both cost-effectively and efficiently. To address t...

Read More

Large Action Models: From Inception to Implementation

Papers with Code
Reporter Kate Martin

By Kate Martin

Posted on: December 16, 2024

Large Action Models: From Inception to Implementation

As AI continues to advance, there is a growing demand for systems that go beyond language-based assistance and move toward intelligent agents capable of performing real-world actions. This evolution requires the transition from traditional Large Language Models (LLMs), which excel at generating text...

Read More

DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding

Papers with Code
Reporter Javier Vásquez

By Javier Vásquez

Posted on: December 16, 2024

DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding

We present DeepSeek-VL2, an advanced series of large Mixture-of-Experts (MoE) Vision-Language Models that significantly improves upon its predecessor, DeepSeek-VL, through two key major upgrades. For the vision component, we incorporate a dynamic tiling vision encoding strategy designed for processi...

Read More

Byte Latent Transformer: Patches Scale Better Than Tokens

Papers with Code
Reporter Naomi Wilson

By Naomi Wilson

Posted on: December 16, 2024

Byte Latent Transformer: Patches Scale Better Than Tokens

We introduce the Byte Latent Transformer (BLT), a new byte-level LLM architecture that, for the first time, matches tokenization-based LLM performance at scale with significant improvements in inference efficiency and robustness. BLT encodes bytes into dynamically sized patches, which serve as the p...

Read More

Hidden Biases of End-to-End Driving Datasets

Papers with Code
Reporter Naomi Wilson

By Naomi Wilson

Posted on: December 13, 2024

Hidden Biases of End-to-End Driving Datasets

End-to-end driving systems have made rapid progress, but have so far not been applied to the challenging new CARLA Leaderboard 2.0. Further, while there is a large body of literature on end-to-end architectures and training strategies, the impact of the training dataset is often overlooked. In this ...

Read More

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

Papers with Code
Reporter Javier Vásquez

By Javier Vásquez

Posted on: December 13, 2024

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

Creating AI systems that can interact with environments over long periods, similar to human cognition, has been a longstanding research goal. Recent advancements in multimodal large language models (MLLMs) have made significant strides in open-world understanding. However, the challenge of continuou...

Read More