This post has been de-listed
It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.
So I am just goofing around with Stable Diffusion, and opted to get Gemini (formally Bard) to generate some prompts. It is doing fairly well, then I realized Gemini is multimodal. So I ask it to create a prompt based on a picture. Of course, it come back saying that because its " Google's AI Principles to return content that depicts people." Not mad or upset. It is the stance that they are taking. BUT... I do have an RTX 4090 at my disposal and can build my own Gemini.
With that being said, is there any OS multimodal LLMs out there that I can start messing with to get the results I am looking for?
(FYI - I am looking to build my own, not looking for some "site" to do the work for me. Only so many subscriptions I can pay for.)
Subreddit
Post Details
- Posted
- 7 months ago
- Reddit URL
- View post on reddit.com
- External URL
- reddit.com/r/LocalLLaMA/...