LLM Token Pricing Explained: Input, Output, and Cache Hits

A practical guide to understanding token billing in large language models, covering input, output, and cache hit pricing for cost optimization.

Large language model (LLM) providers charge based on token usage, but the billing structure can be complex. This signal breaks down the three main components: input tokens (prompt), output tokens (generated text), and cache hits (reused context). Understanding these distinctions is crucial for developers building AI applications, as cache hits can significantly reduce costs. For example, caching frequently used prompts can lower expenses by up to 90%. This knowledge empowers developers to design more efficient systems and choose the right pricing plan for their use case.