Neural audio codecs have recently gained popularity because they can represent audio signals with high fidelity at very low bitrates, making it feasible to use language modeling approaches for audio generation and understanding. Residual Vector Quantization (RVQ) has become the standard technique fo...
A Systematic Study of Cross-Layer KV Sharing for Efficient LLM Inference
By Javier Vásquez
Posted on: October 21, 2024
Recently, sharing key-value (KV) cache across layers has been found effective in efficient inference of large language models (LLMs). To systematically investigate different techniques of cross-layer KV sharing, we propose a unified framework that covers several recent methods and their novel varian...
aiXcoder-7B: A Lightweight and Effective Large Language Model for Code Completion
By Naomi Wilson
Posted on: October 18, 2024
Large Language Models (LLMs) have been widely used in code completion, and researchers are focusing on scaling up LLMs to improve their accuracy. However, larger LLMs will increase the response time of code completion and decrease the developers' productivity. In this paper, we propose a lightweight...
MANet: Fine-Tuning Segment Anything Model for Multimodal Remote Sensing Semantic Segmentation
By Kate Martin
Posted on: October 16, 2024
Multimodal remote sensing data, collected from a variety of sensors, provide a comprehensive and integrated perspective of the Earth's surface. By employing multimodal fusion techniques, semantic segmentation offers more detailed insights into geographic scenes compared to single-modality approaches...
Agent S: An Open Agentic Framework that Uses Computers Like a Human
By Naomi Wilson
Posted on: October 14, 2024
We present Agent S, an open agentic framework that enables autonomous interaction with computers through a Graphical User Interface (GUI), aimed at transforming human-computer interaction by automating complex, multi-step tasks. Agent S aims to address three key challenges in automating computer tas...
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration
By Javier Vásquez
Posted on: October 07, 2024
The transformer architecture predominates across various models. As the heart of the transformer, attention has a computational complexity of O(N^2), compared to O(N) for linear transformations. When handling large sequence lengths, attention becomes the primary time-consuming component. Although qu...
The last decade in deep learning has brought on increasingly capable systems that are deployed on a wide variety of applications. In natural language processing, the field has been transformed by a number of breakthroughs including large language models, which are used in increasingly many user-faci...
Lightning UQ Box: A Comprehensive Framework for Uncertainty Quantification in Deep Learning
By Kate Martin
Posted on: October 07, 2024
Uncertainty quantification (UQ) is an essential tool for applying deep neural networks (DNNs) to real world tasks, as it attaches a degree of confidence to DNN outputs. However, despite its benefits, UQ is often left out of the standard DNN workflow due to the additional technical knowledge required...