基于Unsloth微调的Qwen2.5-7B在Ollama无限循环,Transformers正常
Qwen2.5-7B微调后转GGUF格式通过Ollama部署出现无限循环问题
问题概述
使用Unsloth微调Qwen2.5-7B模型后,通过Transformers库测试生成完全正常,但转换为Q8_0 GGUF格式并通过Ollama部署时,模型生成响应后无法停止,陷入无限循环。相同流程处理Mistral-v0.3和Llama-3.1模型无异常,仅Qwen2.5出现该问题。
环境配置
- 基础模型:unsloth/Qwen2.5-7B
- 微调模板:Alpaca
- 量化格式:Q8_0 GGUF
- 部署环境:Ollama
正常运行验证
以下Transformers代码可正常生成合理响应:
inputs = tokenizer( [ alpaca_prompt.format( "Continue the fibonacci sequence.", # instruction "1, 1, 2, 3, 5, 8", # input "", # output - leave blank for generation ) ], return_tensors="pt" ).to("cuda") from transformers import TextStreamer text_streamer = TextStreamer(tokenizer) _ = model.generate(**inputs, streamer=text_streamer, max_new_tokens=128)
异常场景
转换为GGUF格式后,使用以下Modelfile通过Ollama运行时出现无限循环:
FROM /home/ilab/Desktop/ollama_model/unsloth.Q8_0.gguf TEMPLATE """{{ if .System }}{{ .System }}{{ else }}Below are some instructions that describe some tasks. Write responses that appropriately complete each request.{{ end }} USER: {{ .Prompt }} ASSISTANT: {{ .Response }}{{ if .Response }}<eos>{{ end }}""" PARAMETER stop "[toxicity=0]" PARAMETER stop "[@BOS@]" PARAMETER stop "<eos>" PARAMETER stop "<unused" PARAMETER stop " " PARAMETER stop " " PARAMETER stop " " PARAMETER stop " " PARAMETER temperature 1.5 PARAMETER min_p 0.1 SYSTEM "Below are some instructions that describe some tasks. Write responses that appropriately complete each request."
问题原因与解决方案
1. 停止词配置不匹配Qwen2.5原生Token
Qwen2.5的EOS token是<|endoftext|>,而非自定义的<eos>,且原Modelfile中的[toxicity=0]、[@BOS@]等属于无效停止词,重复的全角空格也会干扰停止逻辑。
2. 模板格式与微调时的Alpaca模板不一致
微调使用的是Alpaca的指令格式(包含### Instruction:、### Response:等标记),但Ollama模板用了USER:/ASSISTANT:的聊天格式,导致模型无法识别生成终止的边界。
修正后的Modelfile
FROM /home/ilab/Desktop/ollama_model/unsloth.Q8_0.gguf TEMPLATE """Below are some instructions that describe some tasks. Write responses that appropriately complete each request. ### Instruction: {{ .Prompt }} ### Response: {{ .Response }}{{ if .Response }}<|endoftext|>{{ end }}""" PARAMETER stop "<|endoftext|>" PARAMETER stop "###" PARAMETER temperature 1.5 PARAMETER min_p 0.1 SYSTEM "Below are some instructions that describe some tasks. Write responses that appropriately complete each request."
额外检查点
- 确保使用最新版本的llama.cpp转换GGUF格式,Qwen2.5是较新模型,旧版本可能存在Token映射或兼容性问题
- 转换GGUF时,确认指定了正确的tokenizer(使用微调时的Unsloth Qwen2.5 tokenizer),避免Token解析错误
内容的提问来源于stack exchange,提问作者AndreasSoul




