You need to enable JavaScript to run this app.
最新活动
大模型
产品
解决方案
定价
生态与合作
支持与服务
开发者
了解我们

使用Unsloth训练AI模型的GPU识别错误与依赖冲突求助

问题描述

两类核心问题

  • GPU识别异常:在Kaggle Colab、Google Colab及Unsloth Studio中训练时,系统误将GPU识别为CPU,提示需使用GPU
  • 依赖版本冲突:
    • Transformers要求Torch≥2.6.0,但安装Torch 2.6.0时,Unsloth抛出AttributeError: module 'torch' has no attribute 'int1'错误
    • 安装Unsloth适配的Torch 2.5.1时,Transformers无法正常运行
    • 同时存在xformers、fastai等包的版本不兼容问题

错误信息

错误信息1(安装Torch 2.6.0时)

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
xformers 0.0.29.post1 requires torch2.5.1, but you have torch 2.6.0 which is incompatible.
torchaudio 2.5.1+cu121 requires torch
2.5.1, but you have torch 2.6.0 which is incompatible.
torchvision 0.20.1+cu121 requires torch==2.5.1, but you have torch 2.6.0 which is incompatible.

错误信息2(安装Torch 2.10.0时)

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
bigframes 2.31.0 requires google-cloud-bigquery-storage<3.0.0,>=2.30.0, which is not installed.
google-adk 1.21.0 requires google-cloud-bigquery-storage>=2.0.0, which is not installed.
cuda-python 12.9.5 requires cuda-bindings~=12.9.5, but you have cuda-bindings 12.9.4 which is incompatible.
torchaudio 2.9.0+cu126 requires torch==2.9.0, but you have torch 2.10.0 which is incompatible.
fastai 2.8.6 requires torch<2.10,>=1.10, but you have torch 2.10.0 which is incompatible.

训练代码

# 使用Unsloth预训练模型步骤(基于自定义数据)
# 步骤1:安装依赖
!pip install <package-name> --use-feature=2020-resolver
!pip install unsloth

!pip install --upgrade unsloth

!pip install --upgrade "google-cloud-bigquery-storage>=2.30.0,<3.0.0"
 
!pip install torch==2.9.0 torchaudio==2.9.0+cpu --extra-index-url https://pytorch.org

!pip install fastai==2.8.6
!pip install google-cloud-bigquery-storage
!pip install --force-reinstall transformers
# 步骤2:准备JSONL格式数据,建议100-500行以获得更好效果

from huggingface_hub import login
import os

# 从Kaggle Secrets获取token
hf_token = os.environ.get('kaggleUnslothModelTK')

# 登录Hugging Face
login(token=hf_token)



# 步骤3:加载数据
from datasets import load_dataset
# 从本地JSONL文件加载数据

dataset_path = "/kaggle/input/datasets/samuelantwi/testdataset updated jsonl",
Type = alpaca,
dataset_prepared_path ="/kaggle/working/last_run_prepared",
val_set_size = 0.05,
# 步骤4:使用Unsloth加载模型和tokenizer(4-bit模式加速)

from unsloth import FastLanguageModel
from unsloth.chat_templates import get_chat_template

# 加载模型和tokenizer
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/tinyllama-bnb-4bit",
    max_seq_length = 2048,
    dtype = None,
    load_in_4bit = True,
)



# 定义格式化函数
def formatting_prompts_func(examples):
    convos = examples["instruction"]
    texts = [tokenizer.apply_chat_template(convo, tokenize = False, add_generation_prompt = False) for convo in convos]
    return { "text": texts, }

# 继续数据集加载和SFTTrainer配置



# 应用LoRA
model = FastLanguageModel.get_peft_model(
    model,
    r = 32, # 秩
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 32,
    lora_dropout = 0, # 优化值为0
    bias = "none",
    use_gradient_checkpointing = "unsloth", # 减少显存占用
)
# 指令调优格式定义
alpaca_prompt = """ 以下是描述任务的指令,请撰写合适的响应完成请求。

### 指令:
{}

### 响应:
{}"""



tokenizer = get_chat_template(
    tokenizer,
    mapping={"role": "from", "content": "value", "user": "human", "assistant": "gpt"},
    chat_template="chatml",
)

def apply_template(examples):
    messages = examples["instruction"]
    text = [tokenizer.apply_chat_template(message, tokenize=False, add_generation_prompt=False) for message in messages]
    return {"text": text}

dataset = load_dataset("mlabonne/FineTome-100k", split="train")
dataset = dataset.map(apply_template, batched=True)

from transformers import AutoModel
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import FastLanguageModel, is_bfloat16_supported

trainer=SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    dataset_num_proc=2,
    packing=True,
    args=TrainingArguments(
        learning_rate=3e-4,
        lr_scheduler_type="linear",
        per_device_train_batch_size=0.01,
        gradient_accumulation_steps=1,
        num_train_epochs=1,
        fp16=not is_bfloat16_supported(),
        bf16=is_bfloat16_supported(),
        logging_steps=1,
        optim="adamw_8bit",
        weight_decay=0.01,
        warmup_steps=1,
        output_dir="/kaggle/working",
        seed=0,
    ),
)

trainer.train()

model.save_pretrained_merged("model", tokenizer, save_method="merged_4bit")
model.push_to_hub_merged("Cirsam/kaggleUnslothModel", tokenizer, save_method="merged_4bit")

model.push_to_hub("Cirsam/kaggleUnslothModel", token = "")
tokenizer.push_to_hub("Cirsam/kaggleUnslothModel", token = "")
# 保存模型
# 保存LoRA适配器(快速)
model.save_pretrained("lora_model")

# 保存为4bit格式(用于GGUF转换或Ollama)
model.save_pretrained_merged("model", tokenizer, save_method = "merged_4bit")
model.push_to_hub("Cirsam/kaggleUnslothModel")
tokenizer.push_to_hub("Cirsam/kaggleUnslothModel")
解决方案

一、GPU识别问题修复

  1. 确认GPU配置:在Colab中通过Runtime > Change runtime type选择GPU加速;Unsloth Studio中确保实例为GPU类型
  2. 验证GPU可用性:执行以下代码检查PyTorch是否识别到GPU:
import torch
print(torch.cuda.is_available())
print(torch.cuda.device_count())
  1. 重置运行时:若识别异常,重启Colab/Unsloth Studio运行时后重新安装依赖

二、版本冲突解决

核心方案:使用Unsloth官方兼容依赖组合

Unsloth官方已适配对应环境的依赖版本,优先使用官方安装脚本:

  1. 清理冲突依赖
!pip uninstall -y torch torchaudio torchvision transformers unsloth fastai xformers
  1. 安装兼容依赖
# 安装Unsloth(自动适配Torch版本)
!pip install "unsloth[colab-new] @ git+https://github.com/unsloth/unsloth.git"
# 安装兼容的Transformers版本
!pip install --upgrade transformers==4.45.2
# 安装适配fastai版本(若需要)
!pip install fastai==2.8.5
  1. 验证版本匹配
import torch
import transformers
import unsloth
print(f"Torch版本: {torch.__version__}")
print(f"Transformers版本: {transformers.__version__}")
print(f"Unsloth版本: {unsloth.__version__}")

额外冲突处理

  • xformers:Unsloth安装时会自动匹配对应Torch版本,无需手动指定
  • google-cloud-bigquery-storage:安装指定版本解决bigframes冲突:
!pip install "google-cloud-bigquery-storage>=2.30.0,<3.0.0"

三、训练代码优化

  1. 移除手动Torch安装命令:删除代码中!pip install torch==2.9.0...行,依赖Unsloth自动安装的兼容版本
  2. 修正batch_size错误per_device_train_batch_size=0.01为无效值,改为整数(如4,根据GPU显存调整)
  3. 统一模型保存逻辑:删除重复的push_to_hub调用,保留一次即可

内容的提问来源于stack exchange,提问作者cirsam

火山引擎 最新活动