This post has been de-listed
It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.
I'm extremely interested in running a self-hosted version of Vicuna-13b. So far, I've been able to get it to run at a very reasonable level of performance in the cloud with a Tesla T4 and V100 by using four and eight bit quantization. I'd love to bring it home and build a private server. However, those cards are mind-numbingly expensive. Although a 3090 has come down in price lately, $700 is still pretty steep. I was doing some research and it seems that a cuda compute capability of 5 or higher is the minimum required. At around $70ish on ebay ($100ish after a blower shroud; I'm aware these are datacenter cards), the Tesla M40 meets that requirement at CC 5.2 as well as having 24GB of VRAM. In theory it sounds like it'd be enough, right? Obviously I'm not going to be training or fine tuning LLMs with the card, but it sounds like it'd be enough for performing inference on the cheap and generating output of four or five tokens per second. What do you all think? Worth investing a few hundred dollars in building a little M40 rig, or would it still be too slow to be worth the trouble?
Oh yeah, I totally get the age being a major factor. The overall goal here is just to have a sub-$500 rig that doesn't take fifteen minutes or more to finish a prompt.
Could you possibly do me a favor and try running Vicuna-13b and telling me how many tokens per second you're able to get? This sounds pretty interesting.
I mean, my overall goal is three to five tokens per second; whether or not this requires a gpu is irrelevant. I really appreciate this! I'll take a look :)
What do you think would be the most cost-effective solution?
Seems to be the consensus is to experiment first before buying the hardware! Thankya w^
Subreddit
Post Details
- Posted
- 1 year ago
- Reddit URL
- View post on reddit.com
- External URL
- reddit.com/r/MachineLear...
I don't want to be chained to the cloud, though. The whole point of a rig like this is a private, personal, self-hosted LLM. I don't want big corps to have access to it.