How to implement gpu/cpu offloading for text-generation-webui? [custom device

This post has been de-listed

It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.

How to implement gpu/cpu offloading for text-generation-webui? [custom device_map]

Discussion

Post Body

Hello, I am trying to set up a custom device_map via hugging face's instructions

https://huggingface.co/docs/transformers/main/en/main_classes/quantization#offload-between-cpu-and-gpu

I have this code inserted into my "server.py" folder for text-generation-webui

# Set the quantization config with llm_int8_enable_fp32_cpu_offload set to True
quantization_config = BitsAndBytesConfig(llm_int8_enable_fp32_cpu_offload=True)
device_map = {
"transformer.word_embeddings": 0,
"transformer.word_embeddings_layernorm": 0,
"lm_head": "cpu",
"transformer.h": 0,
"transformer.ln_f": 0,
}
model_path = "decapoda-research/llama-7b-hf"
model_8bit = AutoModelForCausalLM.from_pretrained(
model_path,
device_map=device_map,
quantization_config=quantization_config,
)

However two problem

It downloads a new copy of the model from hugging face rather than from my model directory.
I get this error even after the download

File "C:\Windows\System32\text-generation-webui\server7b.py", line 33, in <module>
model_8bit = AutoModelForCausalLM.from_pretrained(
File "C:\Users\justi\miniconda3\envs\textgen\lib\site-packages\transformers\models\auto\auto_factory.py", line 471, in from_pretrained
return model_class.from_pretrained(
File "C:\Users\justi\miniconda3\envs\textgen\lib\site-packages\transformers\modeling_utils.py", line 2643, in from_pretrained
) = cls._load_pretrained_model(
File "C:\Users\justi\miniconda3\envs\textgen\lib\site-packages\transformers\modeling_utils.py", line 2966, in _load_pretrained_model
new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
File "C:\Users\justi\miniconda3\envs\textgen\lib\site-packages\transformers\modeling_utils.py", line 662, in _load_state_dict_into_meta_model
raise ValueError(f"{param_name} doesn't have any device set.")
ValueError: model.layers.0.self_attn.q_proj.weight doesn't have any device set.
(textgen) C:\Windows\System32\text-generation-webui>

Does anyone know how to do CPU/GPU offloading for text-generation-webui?

Author

Account Strength

100%

Account Age

11 years

Verified Email

Yes

Verified Flair

Total Karma

172,822

Link Karma

46,617

Comment Karma

16,834

Profile updated: 1 week ago

Posts updated: 2 months ago

SomeGuyInDeutschland

Subreddit

r/singularity

Post Details

We try to extract some basic information from the post title. This is not always successful or accurate, please use your best judgement and compare these values to the post title and body for confirmation.

Posted: 1 year ago
Reddit URL: View post on reddit.com
External URL: reddit.com/r/singularity...