handle long text when doing extraction

When working with files, like PDFs, you’re likely to encounter text that exceeds your language model’s context window. To process this text, consider these strategies:

Change LLM Choose a different LLM that supports a larger context window.
Brute Force Chunk the document, and extract content from each chunk.
RAG Chunk the document, index the chunks, and only extract content from a subset of chunks that look “relevant”.

Keep in mind that these strategies have different trade off and the best strategy likely depends on the application that you’re designing!

Reference List

https://python.langchain.com/docs/how_to/extraction_long_text/

Boyang Yan

Explorer

handle long text when doing extraction

Reference List

Graph View

Backlinks