Any Open-Source Multimodal LLMs out there?

Updated specific locations to be searchable, take a look at Las Vegas as an example.

This post has been de-listed

It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.

17

Any Open-Source Multimodal LLMs out there?

Question | Help

Post Body

So I am just goofing around with Stable Diffusion, and opted to get Gemini (formally Bard) to generate some prompts. It is doing fairly well, then I realized Gemini is multimodal. So I ask it to create a prompt based on a picture. Of course, it come back saying that because its " Google's AI Principles to return content that depicts people." Not mad or upset. It is the stance that they are taking. BUT... I do have an RTX 4090 at my disposal and can build my own Gemini.

With that being said, is there any OS multimodal LLMs out there that I can start messing with to get the results I am looking for?

(FYI - I am looking to build my own, not looking for some "site" to do the work for me. Only so many subscriptions I can pay for.)

Author

Account Strength

90%

Account Age

10 years

Verified Email

Yes

Verified Flair

No

Total Karma

645

Link Karma

182

Comment Karma

463

Profile updated: 16 hours ago

Posts updated: 6 months ago

ebonydad

Subreddit

r/LocalLLaMA

Post Details

We try to extract some basic information from the post title. This is not always successful or accurate, please use your best judgement and compare these values to the post title and body for confirmation.

Posted: 7 months ago
Reddit URL: View post on reddit.com
External URL: reddit.com/r/LocalLLaMA/...