Updated specific locations to be searchable, take a look at Las Vegas as an example.

This post has been de-listed

It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.

17
Any Open-Source Multimodal LLMs out there?
Post Body

So I am just goofing around with Stable Diffusion, and opted to get Gemini (formally Bard) to generate some prompts. It is doing fairly well, then I realized Gemini is multimodal. So I ask it to create a prompt based on a picture. Of course, it come back saying that because its " Google's AI Principles to return content that depicts people." Not mad or upset. It is the stance that they are taking. BUT... I do have an RTX 4090 at my disposal and can build my own Gemini.

With that being said, is there any OS multimodal LLMs out there that I can start messing with to get the results I am looking for?

(FYI - I am looking to build my own, not looking for some "site" to do the work for me. Only so many subscriptions I can pay for.)

Author
Account Strength
90%
Account Age
10 years
Verified Email
Yes
Verified Flair
No
Total Karma
645
Link Karma
182
Comment Karma
463
Profile updated: 16 hours ago
Posts updated: 6 months ago

Subreddit

Post Details

We try to extract some basic information from the post title. This is not always successful or accurate, please use your best judgement and compare these values to the post title and body for confirmation.
Posted
7 months ago