Align Anything: Training All-Modality Models to Follow Instructions with Language Feedback

By Javier Vásquez

Posted on: December 23, 2024

Align Anything: Training All-Modality Models to Follow Instructions with Language Feedback

Reinforcement learning from human feedback (RLHF) has proven effective in enhancing the instruction-following capabilities of large language models; however, it remains underexplored in the cross-modality domain. As the number of modalities increases, aligning all-modality models with human intentio...

Read More →

Causal Diffusion Transformers for Generative Modeling

By Naomi Wilson

Posted on: December 18, 2024

Causal Diffusion Transformers for Generative Modeling

We introduce Causal Diffusion as the autoregressive (AR) counterpart of Diffusion models. It is a next-token(s) forecasting framework that is friendly to both discrete and continuous modalities and compatible with existing next-token prediction models like LLaMA and GPT. While recent works attempt t...

Read More →

AIR-Bench: Automated Heterogeneous Information Retrieval Benchmark

By Javier Vásquez

Posted on: December 18, 2024

AIR-Bench: Automated Heterogeneous Information Retrieval Benchmark

Evaluation plays a crucial role in the advancement of information retrieval (IR) models. However, current benchmarks, which are based on predefined domains and human-labeled data, face limitations in addressing evaluation needs for emerging domains both cost-effectively and efficiently. To address t...

Read More →

Large Action Models: From Inception to Implementation

By Kate Martin

Posted on: December 16, 2024

Large Action Models: From Inception to Implementation

As AI continues to advance, there is a growing demand for systems that go beyond language-based assistance and move toward intelligent agents capable of performing real-world actions. This evolution requires the transition from traditional Large Language Models (LLMs), which excel at generating text...

Read More →

DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding

By Javier Vásquez

Posted on: December 16, 2024

DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding

We present DeepSeek-VL2, an advanced series of large Mixture-of-Experts (MoE) Vision-Language Models that significantly improves upon its predecessor, DeepSeek-VL, through two key major upgrades. For the vision component, we incorporate a dynamic tiling vision encoding strategy designed for processi...

Read More →

Byte Latent Transformer: Patches Scale Better Than Tokens

By Naomi Wilson

Posted on: December 16, 2024

Byte Latent Transformer: Patches Scale Better Than Tokens

We introduce the Byte Latent Transformer (BLT), a new byte-level LLM architecture that, for the first time, matches tokenization-based LLM performance at scale with significant improvements in inference efficiency and robustness. BLT encodes bytes into dynamically sized patches, which serve as the p...

Read More →

Hidden Biases of End-to-End Driving Datasets

By Naomi Wilson

Posted on: December 13, 2024

Hidden Biases of End-to-End Driving Datasets

End-to-end driving systems have made rapid progress, but have so far not been applied to the challenging new CARLA Leaderboard 2.0. Further, while there is a large body of literature on end-to-end architectures and training strategies, the impact of the training dataset is often overlooked. In this ...

Read More →

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

By Javier Vásquez

Posted on: December 13, 2024

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

Creating AI systems that can interact with environments over long periods, similar to human cognition, has been a longstanding research goal. Recent advancements in multimodal large language models (MLLMs) have made significant strides in open-world understanding. However, the challenge of continuou...

Read More →