Touchstone Benchmark: Are We on the Right Way for Evaluating AI Algorithms for Medical Segmentation?
By Naomi Wilson
Posted on: November 08, 2024
How can we test AI performance? This question seems trivial, but it isn't. Standard benchmarks often have problems such as in-distribution and small-size test sets, oversimplified metrics, unfair comparisons, and short-term outcome pressure. As a consequence, good performance on standard benchmarks ...
Equivariant Graph Network Approximations of High-Degree Polynomials for Force Field Prediction
By Kate Martin
Posted on: November 08, 2024
Recent advancements in equivariant deep models have shown promise in accurately predicting atomic potentials and force fields in molecular dynamics simulations. Using spherical harmonics (SH) and tensor products (TP), these equivariant networks gain enhanced physical understanding, like symmetries a...
Measuring short-form factuality in large language models
By Javier Vásquez
Posted on: November 08, 2024
We present SimpleQA, a benchmark that evaluates the ability of language models to answer short, fact-seeking questions. We prioritized two properties in designing this eval. First, SimpleQA is challenging, as it is adversarially collected against GPT-4 responses. Second, responses are easy to grade,...
SVDQunat: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
By Kate Martin
Posted on: November 08, 2024
Diffusion models have been proven highly effective at generating high-quality images. However, as these models grow larger, they require significantly more memory and suffer from higher latency, posing substantial challenges for deployment. In this work, we aim to accelerate diffusion models by quan...
Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent
By Naomi Wilson
Posted on: November 06, 2024
In this paper, we introduce Hunyuan-Large, which is currently the largest open-source Transformer-based mixture of experts model, with a total of 389 billion parameters and 52 billion activation parameters, capable of handling up to 256K tokens. We conduct a thorough evaluation of Hunyuan-Large's su...
xDiT: an Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism
By Kate Martin
Posted on: November 06, 2024
Diffusion models are pivotal for generating high-quality images and videos. Inspired by the success of OpenAI's Sora, the backbone of diffusion models is evolving from U-Net to Transformer, known as Diffusion Transformers (DiTs). However, generating high-quality content necessitates longer sequence ...
No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images
By Javier Vásquez
Posted on: November 04, 2024
We introduce NoPoSplat, a feed-forward model capable of reconstructing 3D scenes parameterized by 3D Gaussians from \textit{unposed} sparse multi-view images. Our model, trained exclusively with photometric loss, achieves real-time 3D Gaussian reconstruction during inference. To eliminate the need f...
This paper presents Randomized AutoRegressive modeling (RAR) for visual generation, which sets a new state-of-the-art performance on the image generation task while maintaining full compatibility with language modeling frameworks. The proposed RAR is simple: during a standard autoregressive training...