Coming soon - Get a detailed view of why an account is flagged as spam!
view details

This post has been de-listed

It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.

5
Retrieve detailed/all information on X from large text instead of generalized summary?
Post Flair (click to view more posts with a particular flair)
Post Body

I am in the Humanities and have been experimenting with exploring monographs of around 500-800 pages of text (as PDFs and some as plain text files) with the help of GPT4. I have tried various approaches like chain prompting and multi-step prompting to get more detailed answers. An example would be a work on medieval book history (800 pages) and asking about the role of one patron for the popularization of a specific genre that I know is discussed throughout the book in various places. GPT 4, GPT 3.5, and Claude 3 (Opus and Sonnet) will all give solid broad answers and outline the main information on the patron but I can not get more details out of any model without prompting it pretty much with exactly the information that I am trying to retrieve. Meaning I need to already know all the answers in order to guide it there. I have tried breaking the book down into smaller chunks but then I really need to break it down into 20 page chunks and go through each one in its own chat and that's just super impractical.

Does anyone have any ideas how to approach this? Essentially, I want to be able to ask any model to retrieve all information about X from an uploaded document not just the most general.

Tbh, I wonder if that's possible at all given the principle behind LLMs. In the end, they are designed to compute a probable response and that will always entail generalization by design, no?

Comments

As you probably already know, GPT doesn't work very well with visually complex or long PDF documents. I use thepi.pe to get the data out & it compresses prompts exceeding the limit by token importance so you can retain a decent context for long docs.

Author
Account Strength
80%
Account Age
4 years
Verified Email
Yes
Verified Flair
No
Total Karma
359
Link Karma
164
Comment Karma
182
Profile updated: 1 week ago

Subreddit

Post Details

We try to extract some basic information from the post title. This is not always successful or accurate, please use your best judgement and compare these values to the post title and body for confirmation.
Posted
9 months ago