When working with files, like PDFs, you’re likely to encounter text that exceeds your language model’s context window. To process this text, consider these strategies:
- Change LLM Choose a different LLM that supports a larger context window.
- Brute Force Chunk the document, and extract content from each chunk.
- RAG Chunk the document, index the chunks, and only extract content from a subset of chunks that look “relevant”.
Keep in mind that these strategies have different trade off and the best strategy likely depends on the application that you’re designing!