+

Research Posts

BitNet a4.8: 4-bit Activations for 1-bit LLMs

Papers with Code
Reporter Kate Martin

By Kate Martin

Posted on: November 11, 2024

BitNet a4.8: 4-bit Activations for 1-bit LLMs

Recent research on the 1-bit Large Language Models (LLMs), such as BitNet b1.58, presents a promising direction for reducing the inference cost of LLMs while maintaining their performance. In this work, we introduce BitNet a4.8, enabling 4-bit activations for 1-bit LLMs. BitNet a4.8 employs a hybrid...

Read More

Enhancing Investment Analysis: Optimizing AI-Agent Collaboration in Financial Research

Papers with Code
Reporter Naomi Wilson

By Naomi Wilson

Posted on: November 11, 2024

Enhancing Investment Analysis: Optimizing AI-Agent Collaboration in Financial Research

In recent years, the application of generative artificial intelligence (GenAI) in financial analysis and investment decision-making has gained significant attention. However, most existing approaches rely on single-agent systems, which fail to fully utilize the collaborative potential of multiple AI...

Read More

MVSplat360: Feed-Forward 360 Scene Synthesis from Sparse Views

Papers with Code
Reporter Javier Vásquez

By Javier Vásquez

Posted on: November 11, 2024

MVSplat360: Feed-Forward 360 Scene Synthesis from Sparse Views

We introduce MVSplat360, a feed-forward approach for 360{\deg} novel view synthesis (NVS) of diverse real-world scenes, using only sparse observations. This setting is inherently ill-posed due to minimal overlap among input views and insufficient visual information provided, making it challenging fo...

Read More

Convolutional Differentiable Logic Gate Networks

Papers with Code
Reporter Naomi Wilson

By Naomi Wilson

Posted on: November 11, 2024

Convolutional Differentiable Logic Gate Networks

With the increasing inference cost of machine learning models, there is a growing interest in models with fast and efficient inference. Recently, an approach for learning logic gate networks directly via a differentiable relaxation was proposed. Logic gate networks are faster than conventional neura...

Read More

Touchstone Benchmark: Are We on the Right Way for Evaluating AI Algorithms for Medical Segmentation?

Papers with Code
Reporter Naomi Wilson

By Naomi Wilson

Posted on: November 08, 2024

Touchstone Benchmark: Are We on the Right Way for Evaluating AI Algorithms for Medical Segmentation?

How can we test AI performance? This question seems trivial, but it isn't. Standard benchmarks often have problems such as in-distribution and small-size test sets, oversimplified metrics, unfair comparisons, and short-term outcome pressure. As a consequence, good performance on standard benchmarks ...

Read More

Equivariant Graph Network Approximations of High-Degree Polynomials for Force Field Prediction

Papers with Code
Reporter Kate Martin

By Kate Martin

Posted on: November 08, 2024

Equivariant Graph Network Approximations of High-Degree Polynomials for Force Field Prediction

Recent advancements in equivariant deep models have shown promise in accurately predicting atomic potentials and force fields in molecular dynamics simulations. Using spherical harmonics (SH) and tensor products (TP), these equivariant networks gain enhanced physical understanding, like symmetries a...

Read More

Measuring short-form factuality in large language models

Papers with Code
Reporter Javier Vásquez

By Javier Vásquez

Posted on: November 08, 2024

Measuring short-form factuality in large language models

We present SimpleQA, a benchmark that evaluates the ability of language models to answer short, fact-seeking questions. We prioritized two properties in designing this eval. First, SimpleQA is challenging, as it is adversarially collected against GPT-4 responses. Second, responses are easy to grade,...

Read More

SVDQunat: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models

Papers with Code
Reporter Kate Martin

By Kate Martin

Posted on: November 08, 2024

SVDQunat: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models

Diffusion models have been proven highly effective at generating high-quality images. However, as these models grow larger, they require significantly more memory and suffer from higher latency, posing substantial challenges for deployment. In this work, we aim to accelerate diffusion models by quan...

Read More