Karatsuba Matrix Multiplication and its Efficient Custom Hardware Implementations
By Javier Vásquez
Posted on: January 17, 2025
While the Karatsuba algorithm reduces the complexity of large integer multiplication, the extra additions required minimize its benefits for smaller integers of more commonly-used bitwidths. In this work, we propose the extension of the scalar Karatsuba multiplication algorithm to matrix multiplicat...
Practical Continual Forgetting for Pre-trained Vision Models
By Naomi Wilson
Posted on: January 17, 2025
For privacy and security concerns, the need to erase unwanted information from pre-trained vision models is becoming evident nowadays. In real-world scenarios, erasure requests originate at any time from both users and model owners, and these requests usually form a sequence. Therefore, under such a...
Leveraging ASIC AI Chips for Homomorphic Encryption
By Naomi Wilson
Posted on: January 15, 2025
Cloud-based services are making the outsourcing of sensitive client data increasingly common. Although homomorphic encryption (HE) offers strong privacy guarantee, it requires substantially more resources than computing on plaintext, often leading to unacceptably large latencies in getting the resul...
Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal Understanding
By Javier Vásquez
Posted on: January 15, 2025
Image pyramids are widely adopted in top-performing methods to obtain multi-scale features for precise visual perception and understanding. However, current image pyramids use the same large-scale model to process multiple resolutions of images, leading to significant computational cost. To address ...
Cosmos World Foundation Model Platform for Physical AI
By Javier Vásquez
Posted on: January 08, 2025
Physical AI needs to be trained digitally first. It needs a digital twin of itself, the policy model, and a digital twin of the world, the world model. In this paper, we present the Cosmos World Foundation Model Platform to help developers build customized world models for their Physical AI setups. ...
X-ray image based medical report generation achieves significant progress in recent years with the help of the large language model, however, these models have not fully exploited the effective information in visual image regions, resulting in reports that are linguistically sound but insufficient i...
OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning
By Kate Martin
Posted on: January 06, 2025
Scoring the Optical Character Recognition (OCR) capabilities of Large Multimodal Models (LMMs) has witnessed growing interest recently. Existing benchmarks have highlighted the impressive performance of LMMs in text recognition; however, their abilities on certain challenging tasks, such as text loc...
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
By Javier Vásquez
Posted on: January 06, 2025
Compared to image-text pair data, interleaved corpora enable Vision-Language Models (VLMs) to understand the world more naturally like humans. However, such existing datasets are crawled from webpage, facing challenges like low knowledge density, loose image-text relations, and poor logical coherenc...