Coming soon - Get a detailed view of why an account is flagged as spam!
view details

This post has been de-listed

It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.

57
Open source library to feed visually complex documents (PDFs, websites, images) into GPT-4
Post Flair (click to view more posts with a particular flair)
Comments

Hi everyone! I recently open sourced a relatively large project (called "The Pipe"), and I hope it can help out anyone on here trying to work with or learn about multimodal AI.

What it is:

The Pipe is a tool for feeding visually complex files (pdf, docx, pptx, etc) and web pages into vision-language models such as GPT-4. It is entirely written in Python, so hopefully I posted this on the right place for those who try it out for yourself or learn from the source code.

Why it exists:

I tried to make an application to chat with my documents and web pages. Sounds simple right? Boy was I wrong. I struggled for months (yes, MONTHS) building absurdly complex custom scrapers (for pdf, powerpoints, word docs, websites, csv, git repos, slides, etc), since traditional scrapers wouldn't give GPT high quality text visual data in an LLM-ready prompt format.

I have also seen an explosion in "Chat with your X" apps that use GPT on the backend on this sub lately, so I hope this will help with those of you trying to build similar things.

What it does not do:

It does not give you free access to GPT-4 usage. You must use your own GPT-4 API key.

[not loaded or deleted]

Good question! I would recommend reading the getting started section of the README. it contains everything you need to start feeding whatever you want into GPT Vision.

If you're feeling up to learning something even more advanced, you can check out this guide to help you build a multimodal RAG system (a.k.a. a really smart "chat with your documents" app)

[not loaded or deleted]

Thank you! You're spot on with the reason for PyTorch beinf a dependency. Also -- if you want to scrape text only, you can use the text_only parameter ;)

[not loaded or deleted]

Hi biglewbowskii, yes -- you can use The Pipe with other LLMs by using a lightweight library aptly named "LiteLLM". There are more details in the readme :)

Author
User Disabled
Account Strength
0%
Disabled 8 months ago
Account Age
9 years
Verified Email
Yes
Verified Flair
No
Total Karma
9,205
Link Karma
7,283
Comment Karma
1,805
Profile updated: 1 week ago
Posts updated: 8 months ago

Subreddit

Post Details

We try to extract some basic information from the post title. This is not always successful or accurate, please use your best judgement and compare these values to the post title and body for confirmation.
Posted
9 months ago