paint-brush
Unveiling Infinite Context Windows: Leveraging LLMs in Streaming Apps with Attention Sinksby@mikeyoung44
1,196 reads
1,196 reads

Unveiling Infinite Context Windows: Leveraging LLMs in Streaming Apps with Attention Sinks

by Mike YoungOctober 4th, 2023
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

Researchers from MIT, Meta AI, and Carnegie Mellon recently proposed StreamingLLM, an efficient framework to enable infinite-length language modeling in LLMs. Their method cleverly exploits the LLMs' tendency to use initial tokens as "attention sinks" to anchor the distribution of attention scores. By caching initial tokens alongside recent ones, they achieved up to 22x faster decoding than prior techniques.
featured image - Unveiling Infinite Context Windows: Leveraging LLMs in Streaming Apps with Attention Sinks
Mike Young HackerNoon profile picture
Mike Young

Mike Young

@mikeyoung44

L O A D I N G
. . . comments & more!

About Author

Mike Young HackerNoon profile picture
Mike Young@mikeyoung44

TOPICS

THIS ARTICLE WAS FEATURED IN...

Permanent on Arweave
Read on Terminal Reader
Read this story in a terminal
 Terminal
Read this story w/o Javascript
Read this story w/o Javascript
 Lite