Large Language Models (LLMs) operate with a defined limit on the number of tokens they can process at once, referred to as the context window. Exceeding this limit can have significant cost and performance implications. Therefore, it is essential to manage the size of the input sent to the LLM, particularly when using chat completion models. […]