This post has been de-listed
It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.
Hi guys. Long story short, my project is gaining a little bit of traction on GtiHub, and I'm looking for some advice on how I can make my data extractor more friendly for local LLMs like LLaVa or QwenVL.
The library contains heuristics for extracting data from different filetypes to feed into vision-language models. Currently, I output the extracted results in the following OpenAI-friendly format:
[
{
"role": "user",
"content": [
{
"type": "text",
"text": "..."
},
{
"type": "image_url",
"image_url": {
"url": "data:image/jpeg;base64,..."
}
}
]
}
I'm assuming this won't work out-of-the-box with local models? What can I do to make my project less dependent on OpenAI?
PS: I do mention LiteLLM in the readme since I have got it working with text-only before, but I am looking for something a bit less hacky.
Thanks!
I wouldn't say it is better than langchain at the moment. Just meant to remedy the problem of langchain tools not being readily compatible with vision models :)
Subreddit
Post Details
- Posted
- 6 months ago
- Reddit URL
- View post on reddit.com
- External URL
- reddit.com/r/LocalLLaMA/...
This is cool, thanks for sharing!