RESEARCH

OmniMem: Perturbation-aware Memory Compression for Streaming Audio-Visual LLMs

ArXiv cs.AI · Tue, 09 Jun 2026 04:00:00 GMT

arXiv:2606.07577v1 Announce Type: new Abstract: Audio-visual large language models (LLMs) hold strong promise for long-form video understanding, yet their long-video inference is fundamentally limited by the linear growth of video tokens and key-value (KV) caches. We present Omni

Read original source Discuss with A.S.I.S