Optimum 文件
最佳化
並獲得增強的文件體驗
開始使用
最佳化
Optimum Intel 可用於應用流行的壓縮技術,例如量化、剪枝和知識蒸餾。
訓練後最佳化
可以使用我們的 `INCQuantizer` 輕鬆地將訓練後壓縮技術(如動態和靜態量化)應用於您的模型。請注意,量化目前僅支援 CPU(僅提供 CPU 後端),因此在以下示例中,我們將不使用 GPU / CUDA。
動態量化
您可以透過使用以下命令列輕鬆地為模型新增動態量化
optimum-cli inc quantize --model distilbert-base-cased-distilled-squad --output quantized_distilbert
在應用訓練後量化時,還可以指定精度容差以及自適應評估函式,以找到滿足指定約束的量化模型。這適用於動態量化和靜態量化。
import evaluate
from optimum.intel import INCQuantizer
from datasets import load_dataset
from transformers import AutoModelForQuestionAnswering, AutoTokenizer, pipeline
from neural_compressor.config import AccuracyCriterion, TuningCriterion, PostTrainingQuantConfig
model_name = "distilbert-base-cased-distilled-squad"
model = AutoModelForQuestionAnswering.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
eval_dataset = load_dataset("squad", split="validation").select(range(64))
task_evaluator = evaluate.evaluator("question-answering")
qa_pipeline = pipeline("question-answering", model=model, tokenizer=tokenizer)
def eval_fn(model):
qa_pipeline.model = model
metrics = task_evaluator.compute(model_or_pipeline=qa_pipeline, data=eval_dataset, metric="squad")
return metrics["f1"]
# Set the accepted accuracy loss to 5%
accuracy_criterion = AccuracyCriterion(tolerable_loss=0.05)
# Set the maximum number of trials to 10
tuning_criterion = TuningCriterion(max_trials=10)
quantization_config = PostTrainingQuantConfig(
approach="dynamic", accuracy_criterion=accuracy_criterion, tuning_criterion=tuning_criterion
)
quantizer = INCQuantizer.from_pretrained(model, eval_fn=eval_fn)
quantizer.quantize(quantization_config=quantization_config, save_directory="dynamic_quantization")靜態量化
同樣地,我們可以應用靜態量化,為此我們還需要生成校準資料集以執行校準步驟。
from functools import partial
from transformers import AutoModelForSequenceClassification, AutoTokenizer
from neural_compressor.config import PostTrainingQuantConfig
from optimum.intel import INCQuantizer
model_name = "distilbert-base-uncased-finetuned-sst-2-english"
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
# The directory where the quantized model will be saved
save_dir = "static_quantization"
def preprocess_function(examples, tokenizer):
return tokenizer(examples["sentence"], padding="max_length", max_length=128, truncation=True)
# Load the quantization configuration detailing the quantization we wish to apply
quantization_config = PostTrainingQuantConfig(approach="static")
quantizer = INCQuantizer.from_pretrained(model)
# Generate the calibration dataset needed for the calibration step
calibration_dataset = quantizer.get_calibration_dataset(
"glue",
dataset_config_name="sst2",
preprocess_function=partial(preprocess_function, tokenizer=tokenizer),
num_samples=100,
dataset_split="train",
)
quantizer = INCQuantizer.from_pretrained(model)
# Apply static quantization and save the resulting model
quantizer.quantize(
quantization_config=quantization_config,
calibration_dataset=calibration_dataset,
save_directory=save_dir,
)指定量化方案
SmoothQuant 方法可用於訓練後量化。與其他訓練後靜態量化方法相比,該方法通常可以提高模型的準確性。這是透過數學等效變換將難度從啟用遷移到權重來實現的。
- quantization_config = PostTrainingQuantConfig(approach="static")
+ recipes={"smooth_quant": True, "smooth_quant_args": {"alpha": 0.5, "folding": True}}
+ quantization_config = PostTrainingQuantConfig(approach="static", backend="ipex", recipes=recipes)請參閱 INC 文件以及使用該方法進行量化的模型列表,瞭解更多詳細資訊。
分散式精度感知調優
模型量化中的一個挑戰是確定平衡精度和效能的最佳配置。分散式調優透過在多個節點之間並行化來加快這一耗時過程,從而線性加速調優過程。
要使用分散式調優,請將 `quant_level` 設定為 `1` 並使用 `mpirun` 執行。
- quantization_config = PostTrainingQuantConfig(approach="static")
+ quantization_config = PostTrainingQuantConfig(approach="static", quant_level=1)mpirun -np <number_of_processes> <RUN_CMD>
訓練期間最佳化
`INCTrainer` 類提供了一個 API,用於在訓練模型時結合不同的壓縮技術,例如知識蒸餾、剪枝和量化。`INCTrainer` 與 🤗 Transformers 的 `Trainer` 非常相似,只需對程式碼進行最少的更改即可替換。
量化
要在訓練期間應用量化,您只需建立適當的配置並將其傳遞給 `INCTrainer`。
import evaluate
import numpy as np
from datasets import load_dataset
from transformers import AutoModelForSequenceClassification, AutoTokenizer, TrainingArguments, default_data_collator
- from transformers import Trainer
+ from optimum.intel import INCModelForSequenceClassification, INCTrainer
+ from neural_compressor import QuantizationAwareTrainingConfig
model_id = "distilbert-base-uncased-finetuned-sst-2-english"
model = AutoModelForSequenceClassification.from_pretrained(model_id)
tokenizer = AutoTokenizer.from_pretrained(model_id)
dataset = load_dataset("glue", "sst2")
dataset = dataset.map(lambda examples: tokenizer(examples["sentence"], padding=True, max_length=128), batched=True)
metric = evaluate.load("glue", "sst2")
compute_metrics = lambda p: metric.compute(predictions=np.argmax(p.predictions, axis=1), references=p.label_ids)
# The directory where the quantized model will be saved
save_dir = "quantized_model"
# The configuration detailing the quantization process
+ quantization_config = QuantizationAwareTrainingConfig()
- trainer = Trainer(
+ trainer = INCTrainer(
model=model,
+ quantization_config=quantization_config,
args=TrainingArguments(save_dir, num_train_epochs=1.0, do_train=True, do_eval=False),
train_dataset=dataset["train"].select(range(300)),
eval_dataset=dataset["validation"],
compute_metrics=compute_metrics,
tokenizer=tokenizer,
data_collator=default_data_collator,
)
train_result = trainer.train()
metrics = trainer.evaluate()
trainer.save_model()
- model = AutoModelForSequenceClassification.from_pretrained(save_dir)
+ model = INCModelForSequenceClassification.from_pretrained(save_dir)剪枝
同樣地,可以透過指定剪枝配置來應用剪枝,詳細說明所需的剪枝過程。要了解有關不同支援方法的更多資訊,您可以參考 Neural Compressor 文件。目前,剪枝應用於線性和卷積層,而不應用於嵌入等其他層。值得一提的是,配置中定義的剪枝稀疏度將應用於這些層,因此不會導致全域性模型稀疏度。
- from transformers import Trainer
+ from optimum.intel import INCTrainer
+ from neural_compressor import WeightPruningConfig
# The configuration detailing the pruning process
+ pruning_config = WeightPruningConfig(
+ pruning_type="magnitude",
+ start_step=0,
+ end_step=15,
+ target_sparsity=0.2,
+ pruning_scope="local",
+ )
- trainer = Trainer(
+ trainer = INCTrainer(
model=model,
+ pruning_config=pruning_config,
args=TrainingArguments(save_dir, num_train_epochs=1.0, do_train=True, do_eval=False),
train_dataset=dataset["train"].select(range(300)),
eval_dataset=dataset["validation"],
compute_metrics=compute_metrics,
tokenizer=tokenizer,
data_collator=default_data_collator,
)
train_result = trainer.train()
metrics = trainer.evaluate()
trainer.save_model()
model = AutoModelForSequenceClassification.from_pretrained(save_dir)知識蒸餾
知識蒸餾也可以同樣的方式應用。要了解有關不同支援方法的更多資訊,您可以參考 Neural Compressor 文件
- from transformers import Trainer
+ from optimum.intel import INCTrainer
+ from neural_compressor import DistillationConfig
+ teacher_model_id = "textattack/bert-base-uncased-SST-2"
+ teacher_model = AutoModelForSequenceClassification.from_pretrained(teacher_model_id)
+ distillation_config = DistillationConfig(teacher_model=teacher_model)
- trainer = Trainer(
+ trainer = INCTrainer(
model=model,
+ distillation_config=distillation_config,
args=TrainingArguments(save_dir, num_train_epochs=1.0, do_train=True, do_eval=False),
train_dataset=dataset["train"].select(range(300)),
eval_dataset=dataset["validation"],
compute_metrics=compute_metrics,
tokenizer=tokenizer,
data_collator=default_data_collator,
)
train_result = trainer.train()
metrics = trainer.evaluate()
trainer.save_model()
model = AutoModelForSequenceClassification.from_pretrained(save_dir)載入量化模型
要載入本地或 🤗 Hub 上託管的量化模型,您必須使用我們的 `INCModelForXxx` 類例項化模型。
from optimum.intel import INCModelForSequenceClassification
model_name = "Intel/distilbert-base-uncased-finetuned-sst-2-english-int8-dynamic"
model = INCModelForSequenceClassification.from_pretrained(model_name)您可以在 Intel 組織下的 Hub 上載入更多量化模型,請點選這裡。
使用 Transformers pipeline 進行推理
然後,量化模型可以輕鬆地與 Transformers pipeline 一起用於執行推理。
from transformers import AutoTokenizer, pipeline
tokenizer = AutoTokenizer.from_pretrained(model_id)
pipe_cls = pipeline("text-classification", model=model, tokenizer=tokenizer)
text = "He's a dreadful magician."
outputs = pipe_cls(text)
[{'label': 'NEGATIVE', 'score': 0.9880216121673584}]有關更高階的用法,請檢視 `examples` 目錄。
< > 在 GitHub 上更新