Open source library to feed visually complex documents (PDFs, websites, images) into GPT-4

This post has been de-listed

It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.

Post Flair (click to view more posts with a particular flair)

Project

Comments

Emcf

Hi everyone! I recently open sourced a relatively large project (called "The Pipe"), and I hope it can help out anyone on here trying to work with or learn about multimodal AI.

What it is:

The Pipe is a tool for feeding visually complex files (pdf, docx, pptx, etc) and web pages into vision-language models such as GPT-4. It is entirely written in Python, so hopefully I posted this on the right place for those who try it out for yourself or learn from the source code.

Why it exists:

I tried to make an application to chat with my documents and web pages. Sounds simple right? Boy was I wrong. I struggled for months (yes, MONTHS) building absurdly complex custom scrapers (for pdf, powerpoints, word docs, websites, csv, git repos, slides, etc), since traditional scrapers wouldn't give GPT high quality text visual data in an LLM-ready prompt format.

I have also seen an explosion in "Chat with your X" apps that use GPT on the backend on this sub lately, so I hope this will help with those of you trying to build similar things.

What it does not do:

It does not give you free access to GPT-4 usage. You must use your own GPT-4 API key.

[not loaded or deleted]

Emcf

Good question! I would recommend reading the getting started section of the README. it contains everything you need to start feeding whatever you want into GPT Vision.

If you're feeling up to learning something even more advanced, you can check out this guide to help you build a multimodal RAG system (a.k.a. a really smart "chat with your documents" app)

[not loaded or deleted]

Emcf

Thank you! You're spot on with the reason for PyTorch beinf a dependency. Also -- if you want to scrape text only, you can use the text_only parameter ;)

[not loaded or deleted]

Emcf

Hi biglewbowskii, yes -- you can use The Pipe with other LLMs by using a lightweight library aptly named "LiteLLM". There are more details in the readme :)

Author

User Disabled

Account Strength

Disabled 8 months ago

Account Age

9 years

Verified Email

Yes

Verified Flair

Total Karma

9,205

Link Karma

7,283

Comment Karma

1,805

Profile updated: 1 week ago

Posts updated: 8 months ago

Emcf

Subreddit

r/learnmachinelearning

Post Details

We try to extract some basic information from the post title. This is not always successful or accurate, please use your best judgement and compare these values to the post title and body for confirmation.

Posted: 9 months ago
Reddit URL: View post on reddit.com
External URL: github.com/emcf/thepipe