+

Research Posts

Karatsuba Matrix Multiplication and its Efficient Custom Hardware Implementations

Papers with Code
Reporter Javier Vásquez

By Javier Vásquez

Posted on: January 17, 2025

Karatsuba Matrix Multiplication and its Efficient Custom Hardware Implementations

While the Karatsuba algorithm reduces the complexity of large integer multiplication, the extra additions required minimize its benefits for smaller integers of more commonly-used bitwidths. In this work, we propose the extension of the scalar Karatsuba multiplication algorithm to matrix multiplicat...

Read More

Practical Continual Forgetting for Pre-trained Vision Models

Papers with Code
Reporter Naomi Wilson

By Naomi Wilson

Posted on: January 17, 2025

Practical Continual Forgetting for Pre-trained Vision Models

For privacy and security concerns, the need to erase unwanted information from pre-trained vision models is becoming evident nowadays. In real-world scenarios, erasure requests originate at any time from both users and model owners, and these requests usually form a sequence. Therefore, under such a...

Read More

Leveraging ASIC AI Chips for Homomorphic Encryption

Papers with Code
Reporter Naomi Wilson

By Naomi Wilson

Posted on: January 15, 2025

Leveraging ASIC AI Chips for Homomorphic Encryption

Cloud-based services are making the outsourcing of sensitive client data increasingly common. Although homomorphic encryption (HE) offers strong privacy guarantee, it requires substantially more resources than computing on plaintext, often leading to unacceptably large latencies in getting the resul...

Read More

Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal Understanding

Papers with Code
Reporter Javier Vásquez

By Javier Vásquez

Posted on: January 15, 2025

Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal Understanding

Image pyramids are widely adopted in top-performing methods to obtain multi-scale features for precise visual perception and understanding. However, current image pyramids use the same large-scale model to process multiple resolutions of images, leading to significant computational cost. To address ...

Read More

Cosmos World Foundation Model Platform for Physical AI

Papers with Code
Reporter Javier Vásquez

By Javier Vásquez

Posted on: January 08, 2025

Cosmos World Foundation Model Platform for Physical AI

Physical AI needs to be trained digitally first. It needs a digital twin of itself, the policy model, and a digital twin of the world, the world model. In this paper, we present the Cosmos World Foundation Model Platform to help developers build customized world models for their Physical AI setups. ...

Read More

Activating Associative Disease-Aware Vision Token Memory for LLM-Based X-ray Report Generation

Papers with Code
Reporter Kate Martin

By Kate Martin

Posted on: January 08, 2025

Activating Associative Disease-Aware Vision Token Memory for LLM-Based X-ray Report Generation

X-ray image based medical report generation achieves significant progress in recent years with the help of the large language model, however, these models have not fully exploited the effective information in visual image regions, resulting in reports that are linguistically sound but insufficient i...

Read More

OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning

Papers with Code
Reporter Kate Martin

By Kate Martin

Posted on: January 06, 2025

OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning

Scoring the Optical Character Recognition (OCR) capabilities of Large Multimodal Models (LMMs) has witnessed growing interest recently. Existing benchmarks have highlighted the impressive performance of LMMs in text recognition; however, their abilities on certain challenging tasks, such as text loc...

Read More

2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

Papers with Code
Reporter Javier Vásquez

By Javier Vásquez

Posted on: January 06, 2025

2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

Compared to image-text pair data, interleaved corpora enable Vision-Language Models (VLMs) to understand the world more naturally like humans. However, such existing datasets are crawled from webpage, facing challenges like low knowledge density, loose image-text relations, and poor logical coherenc...

Read More