🔥
MiniCPM-V 2.6 Quantization Tutorial

用户309

用户3683

2024年9月26日修改

🙌

Hugging Face: https://huggingface.co/openbmb/MiniCPM-V-2_6

GitHub: https://github.com/OpenBMB/MiniCPM-V/tree/main

Tutorial Project: https://github.com/OpenBMB/MiniCPM-CookBook.git

Suitable for: Those who can perform basic data processing based on Python scripts and use the most basic Bash language.​

awq quantization (vllm recommended)

My pip list（awq,fp16,vllm both work）：

vllm	transformers	torchvision	torch	triton	trl	autoawq_kernels
0.5.4	4.44.0	0.19.0	2.4.0	3.0.0	0.9.6	0.0.6

Method one (recommended):

1.
Directly download the quantified model​

代码块

git clone https://www.modelscope.cn/models/linglingdan/MiniCPM-V_2_6_awq_int4

2.
Download and compile autoawq forked by the author：​

代码块

git clone https://github.com/LDLINGLINGLING/AutoAWQ.git​
cd AutoAWQ​
git checkout minicpmv2.6​
pip install e .​

3.
Run directly (vllm is used in the same way as the non-quantified model, see ​MiniCPM-V 2.6 Deployment Tutorial）：​

代码块

from PIL import Image​
from transformers import AutoTokenizer​
from vllm import LLM, SamplingParams​
​
# List of image file paths​
IMAGES = [​
    "/root/ld/ld_project/MiniCPM-V/assets/airplane.jpeg",  # Local image path​
]​
​
# Change this to your quantized AWQ model path​
MODEL_NAME = "/root/ld/ld_model_pretrained/Minicpmv2_6"  # AWQ model path​
# Open and convert the image​
image = Image.open(IMAGES[0]).convert("RGB")​
​
# Initialize the tokenizer​
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)​
​
# Initialize the language model​
llm = LLM(model=MODEL_NAME,​
           gpu_memory_utilization=1,  # Use all GPU memory​
           trust_remote_code=True,​
           max_model_len=2048)  # Adjust this value according to memory availability​
​
# Build the conversation message​
messages = [{'role': 'user', 'content': '(<image>./</image>)\n' + 'Please describe this picture'}]​
​
# Apply the conversation template to the messages​
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)​
​
​
# set stop token id​
# 2.0​
# stop_token_ids = [tokenizer.eos_id]​
# 2.5​
#stop_token_ids = [tokenizer.eos_id, tokenizer.eot_id]​
# 2.6 ​
stop_tokens = ['<|im_end|>', '<|endoftext|>']​
stop_token_ids = [tokenizer.convert_tokens_to_ids(i) for i in stop_tokens]​
​
# set generate param​
sampling_params = SamplingParams(​
    stop_token_ids=stop_token_ids,​
    # temperature=0.7,​
    # top_p=0.8,​
    # top_k=100,​
    # seed=3472,​
    max_tokens=1024,​
    # min_tokens=150,​
    temperature=0,​
    use_beam_search=True,​
    # length_penalty=1.2,​
    best_of=3)​
​
# get output of  model ​
outputs = llm.generate({​
    "prompt": prompt,​
    "multi_modal_data": {​
        "image": image​
    }​
}, sampling_params=sampling_params)​
print(outputs[0].outputs[0].text)​