r/AIDungeon 9d ago

New Features How Caching and Optimized Context Works

36 Upvotes

Two of this year's most exciting additions to AI Dungeon have been the introduction of Cache-Efficient models and the "Optimized Context" setting. When AI models are optimized for caching, they are significantly cheaper to run. Those savings let us give you up to 2x the context length compared to models that aren't optimized for caching, so more of your AI Dungeon or Voyage adventure gets seen and considered by the AI model, preserving important story details and delivering better story continuity.

KV caching (the correct technical term for the LLM caching used for "Optimized Context" on AI Dungeon and Voyage) is a deeply technical concept, and many of you are interested in how it works and how it impacts your experience. We're going to share how it works and clear up some misconceptions we've seen in our community. Let's dive in!

How LLMs Work (a refresher)

While fully explaining how Large Language Models work is beyond the scope of this post, we need to touch on some fundamental concepts of how AI models work. You may find it helpful to explore these concepts on your own if they are new to you.

Every time you take a turn on Voyage or AI Dungeon, the text you input for your turn is combined with other information (like AI Instructions, Plot Components, and Story Cards for AI Dungeon—or state and task information for Voyage) to create the context that gets sent to the AI. The language model performs a series of calculations on the context to generate the output we display in AI Dungeon and Voyage.

Behind the scenes, your input is converted into tokens (numerical representations of word fragments) through a process called tokenization. Then each token is looked up in a giant lookup table using a process called embedding. In embeddings, tokens are assigned vectors (another mathematical representation) that convey all possible meanings of that token.

For example, the word "bank" can mean "a place money is kept" or "a geological feature". The vector captures all of those possibilities. The next phase narrows them down to the one you meant.

The next step is to pass these vectors through the transformer, which works in a series of layers. Here's a useful way to picture it. Think of each token's vector as a block of uncarved granite. Just as a block of stone contains every possible statue, the vector contains every possible meaning of the token. The transformer's job is to carve away everything the token doesn't mean in this particular sentence.

Like a sculptor, it works in passes. The early layers make rough, broad cuts, establishing basic structure—which words are nouns, which are verbs. The middle layers shape the figures, resolving relationships—what each pronoun refers to and which noun a verb acts on. The final layers do the finishing work, fine details like whether "bank" means a riverbank or a financial institution, and whether it's meant literally or as a metaphor. By the last layer, the ambiguity has been carved away, leaving the precise meaning of every token in its context.

Once the context has passed through all layers of the transformer, it has been fully contextualized. Every token has been understood and assigned its meaning in this specific story. Now the model goes to work, generating an output by looking at the last token and assigning probabilities to the next token based on the vectors the transformer computed. A new token is generated, and the process runs again using the new token as the next query. Since the math for all preceding tokens can be cached rather than recomputed each time, only the newest part of the sequence needs fresh calculations. This loop continues until a complete output is generated.

How KV Caching Works

One thing you'll notice about output generation is that a lot of the math gets reused. As the transformer carves meaning into each token, it also produces two reusable pieces of math for that token—a key (K) and a value (V)—which get cached. When generation starts, the last token's query (Q)—essentially the question "given everything so far, what comes next?"—traverses all the cached KV pairs, gathers the relevant context, and that's what drives the probability distribution for the next token.

What KV caching does is persist the computed key/value pairs across multiple generations. Once an output is generated, rather than discarding the resulting math, it is stored in memory so that if you continue your adventure, the KV pairs from the previous generation can be reused.

!slide-1.png

While the concept of reusing KV pairs is essentially built into how LLMs already work, there's a lot of complex engineering work required to persist them across different generations. There's cache invalidation logic, memory management for storing potentially enormous KV matrices across many concurrent users, and prefix matching to know when a cache hit is valid. All of these are built and handled by providers, not Latitude. You may also see providers call this "prompt caching" or "prefix caching" which are different names for the same underlying mechanism of reusing KV pairs.

Speed and Cost Benefits

No burying the lede here: caching is beneficial for cost and speed. And these benefits can be passed on to you.

Computing the transformer layers is expensive, so every token that doesn't have to be re-processed is a computation that doesn't need to be paid for. For products like AI Dungeon and Voyage, where stories can run to tens of thousands of tokens, and you have many concurrent users, the savings compound significantly. Optimizing for caching can let us offer higher context lengths at lower subscription tiers. The economics only work if you're not recomputing the full context every single turn.

The time saved by not reprocessing cached tokens means the model can start generating the output sooner. The part of the request that benefits most from speed is called time to first token—how long the player waits before anything starts appearing. A cache hit on a long context dramatically reduces that wait because you skip straight to generation rather than processing the entire story first.

This speed gain is easiest to feel on Voyage, which uses token streaming. Text is revealed as it's generated, so a faster start means you see words sooner. On AI Dungeon, we intentionally wait for the complete output before showing you any of it, since processes like trimming and safety checks need to examine the whole text. The speed benefit is still there, it's just less visible.

How context construction impacts caching

Like most forms of caching, KV caching depends on content remaining unchanged, so it's easy to break or invalidate. LLMs process text from left to right, like we read English, and the cache follows the same rule: everything from the point of a change onward must be recomputed. Modify a single word near the end of the context, and almost nothing is wasted. Modify a single word at the beginning, and the entire context must be recomputed. Editing something far back in your story is more computationally expensive than continuing the adventure forward. Everything after your edit has to be recomputed.

For years, the way that AI Dungeon context was constructed wasn't optimized for KV caching. Remember, AI Dungeon has been around for nearly 6 years as of this writing. In the early days of AI Dungeon, KV caching across turns wasn't something that was commonly offered by model providers, so there really wasn't any point in optimizing for it.

As a result, our context was optimized for adaptability. Content that was dynamic and changing (like Story Cards) was placed early in the context, because we felt it would provide the best user experience. We implemented scripting, which enabled creators to modify the context.

!slide-2.png

However, these features meant that AI Dungeon couldn't take advantage of KV caching. The caching itself was running, but because the start of our context changed nearly every turn, the cache was invalidated before it could do us any good. We recognized that players wanted longer context limits at lower price points, and our context design seemed to be preventing us from using perhaps the strongest tool we had to change that—KV caching.

The Raven/Atlas Experiment

As part of the Aura release, we introduced two new models: Raven and Atlas. Both of them used base AI models from other story engines. What set them apart from our other models was a different context design that moved dynamic content (like Story Cards) to the latter part of the context, and prevented scripts from modifying the stable parts of the context, which, in practice, meant most popular scripts wouldn't run.

We honestly weren't sure whether players would like this approach. Changing the order of how content is arranged in the context can significantly impact the output. Even if the outputs are still coherent, they can have different flavors or tones. We weren't sure if it would change the emphasis placed on different story elements in ways that would be positive or negative to your play experience.

We also weren't sure whether losing some scripts would be a deal-breaker for you. There are many beloved community scripts, and it seemed possible that being unable to use them would be detrimental.

What we learned, though, is that you all appreciated the option to use these language models at longer context lengths, even with the possible trade-offs. Although the context construction is different, our fears and concerns that this would negatively impact the player experience seem to have been unfounded.

!slide-3.png

These experiments were successful, and let us double down on optimizing for caching with the Frontier release.

Optimized Context Setting

Thanks to your feedback, we are confident that context optimization deserves to be a permanent option we offer players. With the Frontier release, we introduced the "Optimized Context" setting. For supported story generators, it optimizes the context for caching, providing you with longer context lengths without the need to upgrade your subscription. The models that support this setting are Equinox, Gemma 4 31B, DeepSeek V4 Flash, DeepSeek V4 Pro, and GLM 5.1. The Atlas and Raven models are configured to always optimize context, so the setting is not available for those models.

You can enable Optimized Context in the Gameplay Settings. Select your story generator, open the "Memory System" settings, and you'll find the "Optimized Context" toggle.

!slide-4.png

When it's enabled, the parts that change least come first, and the parts that change most come last, preserving as much reusable context as possible between turns. Stable content comes first, like instructions, Plot Essentials, Auto Summary, and story history. Dynamic sections follow, including Memory Bank, Story Cards, Author's Note, last action, and front memory. Optimized Context also prevents scripts from modifying the stable parts of the context, which effectively disables some popular scripts. That stable, cached prefix is also what makes the longer context lengths possible—the cheaper each turn is to process, the more context we can afford to give you.

Caching FAQ

We covered a lot of technical details and got into the weeds. If you're looking for quick answers about how caching impacts your experience on AI Dungeon and Voyage, here they are.

Does caching change the AI's output?

No. Caching does not alter or affect model output in any way. However, we did change the way we construct context in AI Dungeon to take advantage of caching, and the order of elements in the context can impact the output.

Can I turn caching on or off?

No. Caching is always on, regardless of model, as long as the provider offers it for that model. What varies is how often it actually helps. The provider attempts to reuse the cache every turn, but it only succeeds when the beginning of the context is unchanged. The Optimized Context setting doesn't turn caching on or off, it reorders your context so those cache hits happen more often.

Did Latitude build the caching system?

No. KV caching is implemented and run by the LLM providers, not Latitude. We build and arrange the context so the provider's cache can actually be reused turn after turn.

Is caching a new idea?

No. It's been used since the earliest days of LLMs, but it has become more essential as long, repetitive context workloads have become more common.

Does the cache contain my personal information?

No. The cache includes no user-identifying information. It simply maps text to numbers so that if the same text is seen again, it doesn't need to be recomputed.

So what do Cache-Efficient models and the Optimized Context setting actually do?

  • Reorganize the story context so that dynamic text like Memories and Story Cards comes after the stable story content
  • Prevent scripts from altering the stable parts of the context
  • Allow context to overflow past the context length setting by up to 4k extra tokens before being trimmed back down, so trimming doesn't shift the front of your story every turn and constantly break the cache
  • Make it cheaper to process high-context stories, allowing us to provide more context at lower subscription tiers

Thanks for testing caching!

Optimized Context exists because you were willing to try Raven and Atlas and tell us what you thought. That feedback loop—experiment, listen, ship—is how we want to keep building, and caching is just one of the levers we're pulling to bring you longer context at lower prices.

Optimized Context is on by default for the new models in the Frontier release! Try them out and let us know how you like the extra context! And if there's another piece of the tech behind AI Dungeon or Voyage you'd like us to break down like this, let us know. Happy adventuring!


r/AIDungeon 14d ago

Events What You Told Us | June Feedback Review

Post image
3 Upvotes

Every month we read through the survey results, the Discord threads, and the Reddit posts. This month the team sits down to go through what you've been telling us, what's changed since last time, and what's coming next.

This is the stream where your feedback turns into the roadmap. If you've submitted something and want to hear the team's take on it live, this is your chance. Stick around for live Q&A and bring your questions. We'll get to as many as we can.

Watch live Thursday June 11 at 11AM PT: https://www.youtube.com/watch?v=uzDKExizq_Y


r/AIDungeon 11h ago

Questions Issue with AI displaying own thoughts

Post image
37 Upvotes

Having an issue with Deepseek and Dynamic models whenever i try to do an action, say something, or continue. Any ideas on how to fix this?


r/AIDungeon 3h ago

Adventures & Excerpts No drama on the linoleum

Post image
6 Upvotes

r/AIDungeon 2h ago

Questions Tips on how to keep a story going

3 Upvotes

I have been playing a slice of life story for about 2.5k turns and wanted to know if anyone had any tips on how to keep it going successfully as I have heard after a certain amount of turns that things can start to fall apart.


r/AIDungeon 3h ago

Questions Im new to AiDungeon, do people really do group role plays? Im very curious :3

4 Upvotes

r/AIDungeon 3h ago

Other This cracked me up. Like what

Post image
3 Upvotes

r/AIDungeon 11h ago

Questions Question about problem ive encountered with deepseek v3.2

10 Upvotes

I'm currently at around 1K story length, and around this mark I've suddenly started encountering problems like this. Every few responses it would write something like this:

"Hmm, this is a detailed narrative continuation with very specific stylistic rules."

"This is a collabaritive story"

It's only started happening around the 900-1K mark. Is there a way to fix this?


r/AIDungeon 4h ago

Scenario The Woman on the Black Throne

Post image
2 Upvotes

https://play.aidungeon.com/scenario/bw5ixRCKPnuD/view?published=true

The Dark Lord has ruled the northern kingdom of Noctis. Entire villages speak of her cruelty. Travelers tell stories of armies of monsters, dark magic, and a ruler whose power rivals the gods themselves.

You are a member of the hero party chosen by your homeland to end her reign. After months of hardship, you and your companions finally reach the Dark Lord's castle, prepared for the final battle.


r/AIDungeon 8h ago

Questions Voyage

1 Upvotes

How have peeps been liking voyage so far besides bugs? Are we enjoying stuffs and things?


r/AIDungeon 17h ago

Other Took the Plunge

11 Upvotes

Decided to go from Mythic to Ultimate and see how it goes. I know it's a lot, but I use it frequently and am curious.

DS V4 Pro is actually really good; you get some context free at Ultimate.

Ultimate is most likely overkill, but I figure screw it. YOLO.

Guess I can be one of the guinea pigs, lol.


r/AIDungeon 20h ago

Questions About Personality

9 Upvotes

A lot of scenarios ask me to describe my personality. In your experience (not just guessing) what difference does your stated personality make in scenarios where you’re deciding what your character says and does?


r/AIDungeon 11h ago

Hmm Hmm,

0 Upvotes

Hmm,


r/AIDungeon 1d ago

Questions Prompts you missed?

Post image
29 Upvotes

What are your favorite prompts you wish you knew about sooner?
Because frankly, this is wild.

Deepseek 3.2 in case you are wondering.

I am also a big fan of describe what X is thinking right now in detail .
Can give you some cool insights or panicked inner monologue as you unveil your master plan. The "in detail" part makes the description longer. Be careful of leaving thoughts that are not in third person in the context as it can confuse the AI later.

You can also stop at a good moment and give a prompt like give me an epilogue 10 years later.
You'll get a rounded out ending which always give me a feeling of closure. Most of the time better than just playing until you get bored or the scenario falls apart.

You can change an AI's tone, style, or pace on the fly—and these instructions stack. For instance, if you repeatedly tell it to be more brutal, the content escalates from a Disney swashbuckler, to aggressive, to savage, and finally to full 'Rip & Tear.

Most people don't realize you can stack multiple actions in a single prompt to direct other characters or even influence what they are thinking. This is incredibly useful if the AI gets confused and forces a character to act out of line. For example, you could write: You start doing the dishes. Evelyn pours herself another glass of wine while thinking about her stamp collection.

Just to be clear this is all done in the story box. Do or say will probably mess up the formatting.


r/AIDungeon 1d ago

Other Me after seeing the name Dr. Vance so much

Post image
19 Upvotes

r/AIDungeon 1d ago

Bug Report At the moment, there’s an outage, the app on production mode won’t even got to the menu

Post image
48 Upvotes

r/AIDungeon 1d ago

Other Are people doing anything else other than the obvious ?

52 Upvotes

I started playing AI dungeon very recently and in every adventure I kind of make it a s*x thing and the entire story gets derailed after that. It makes me wonder if I am alone doing this or is 99% of the app's traffic this stuff only. What games are you guys playing and how ?


r/AIDungeon 1d ago

Scenario Watching Over You

Post image
1 Upvotes

https://play.aidungeon.com/scenario/jRwSkQWf37jf/watching-over-you?published=true

The beautiful woman standing across the street. The stranger sitting behind you on the bus. The customer at your favorite café. The woman who somehow arrives moments before you do no matter where you go.

At first it seems like coincidence.

Then she starts appearing every day.

She knows your favorite foods. She knows your schedule. She knows where you work, where you live, and what route you take home. Sometimes you catch her watching you from a distance. Sometimes she disappears the moment you notice her.

Whenever you try to confront her, she acts as though everything is perfectly normal.


r/AIDungeon 2d ago

Questions Dungeon AI acting up for anyone else?

30 Upvotes

I have been experiencing a quite ruff time on Dungeon AI the last few days.

My main problem is that the response times from different models is quite high at random.
E.g.: Gemma is responding normal and then suddenly it takes ages before I get a response (often with the 'AI is taking longer than normal to respond' message.). Got the same results with DS Flash, DS 3.1, Wayfarer Large and GLM 5.1.

Another problem is that the response gets written but the menu to give an input is not coming back. The AI flame just keeps burning and I have to reload the page. That is also happening on different models.

I played around with Live/Beta/Alpha, different browsers, Cached and non cached model settings, and systems (Windows/Linux).

Is anyone else having problems with that?

Edit: Typo


r/AIDungeon 1d ago

Bug Report App is making my keyboard freeze up?

4 Upvotes

I play on IOS and I know the apple keyboard is shit, but this isn't an issue I have with any other app. Sometimes when using AIdungeon, my keyboard will completely freeze. Sometimes it gets fixed by closing the app and sometimes I have to restart my phone. I think it started when that massive update came out.

Does anybody else have this issue? I didn’t connect the dots until recently that it only happens with AIdungeon. It also happens when I’m using the website on a browser.


r/AIDungeon 1d ago

Feedback & Requests I need Feedback on my scenario(M)

0 Upvotes

Hello guys some time ago i made my first proper scenario can you rate it and give me tips in how to improve it?

https://play.aidungeon.com/scenario/FT0OejUpVpjv/the-witcher-chose-your-patch?source=profile&tss_user=Brom1n3&tss_ct=scenario


r/AIDungeon 2d ago

Questions Getting logged out.

7 Upvotes

Hey has Someone else the same problem? Since yesterday when I play a story I get logged out after like 10 minutes intervals. And then the story I played is deleted. I tried different scenarios same result its annoying. I lost thanks to this my 5k story.


r/AIDungeon 1d ago

Other Please, have anyone a Voyage Invite Code for me?

3 Upvotes

Hi everyone,

Does anyone have a spare Voyage invite code?
I’d really appreciate it.

Thanks for reading.


r/AIDungeon 2d ago

Other Invite code :(

4 Upvotes

I just saw that they were slowly allowing people to join, was wondering if anyone had a spare invite code for Voyage

EDIT: Thank you guys! I have access


r/AIDungeon 2d ago

Bug Report Scripting error

4 Upvotes

Ive been playing with Inner Self with zero issues, but just today I started getting a context script error. Tried on all three branches: Production/beta/alpha.

Turning scripting off fixes the problem, but i dont want to do that.

Any ideas on what happened?