4 位量化

QLoRA 是一種微調方法，它將模型量化為 4 位，並在模型中新增一組低秩適應（LoRA）權重，透過量化權重對其進行微調。該方法除了標準的 Float4 資料型別 (LinearFP4) 外，還引入了一種新的資料型別，即 4 位 NormalFloat (LinearNF4)。LinearNF4 是一種用於正態分佈資料的量化資料型別，可以提高效能。

Linear4bit

class bitsandbytes.nn.Linear4bit

< 原始碼 >

( input_features output_features bias = True compute_dtype = None compress_statistics = True quant_type = 'fp4' quant_storage = torch.uint8 device = None )

該類是 QLoRA 中提出的 4 位量化演算法的基礎模組。QLoRA 4 位線性層在底層使用塊級 k 位量化，並可以選擇各種計算資料型別，如 FP4 和 NF4。

為了量化一個線性層，首先應將原始的 fp16 / bf16 權重載入到 Linear4bit 模組中，然後呼叫 quantized_module.to("cuda") 來量化 fp16 / bf16 權重。

示例

import torch
import torch.nn as nn

import bitsandbytes as bnb
from bnb.nn import Linear4bit

fp16_model = nn.Sequential(
    nn.Linear(64, 64),
    nn.Linear(64, 64)
)

quantized_model = nn.Sequential(
    Linear4bit(64, 64),
    Linear4bit(64, 64)
)

quantized_model.load_state_dict(fp16_model.state_dict())
quantized_model = quantized_model.to(0) # Quantization happens here

init

< 原始碼 >

( input_features output_features bias = True compute_dtype = None compress_statistics = True quant_type = 'fp4' quant_storage = torch.uint8 device = None )

引數

input_features (str) — 線性層的輸入特徵數。
output_features (str) — 線性層的輸出特徵數。
bias (bool, 預設為 True) — 線性類是否也使用偏置項。

初始化 Linear4bit 類。

LinearFP4

class bitsandbytes.nn.LinearFP4

< 原始碼 >

( input_features output_features bias = True compute_dtype = None compress_statistics = True quant_storage = torch.uint8 device = None )

實現 FP4 資料型別。

init

< 原始碼 >

( input_features output_features bias = True compute_dtype = None compress_statistics = True quant_storage = torch.uint8 device = None )

引數

input_features (str) — 線性層的輸入特徵數。
output_features (str) — 線性層的輸出特徵數。
bias (bool, 預設為 True) — 線性類是否也使用偏置項。

LinearNF4

class bitsandbytes.nn.LinearNF4

< 原始碼 >

( input_features output_features bias = True compute_dtype = None compress_statistics = True quant_storage = torch.uint8 device = None )

實現 NF4 資料型別。

構造一種量化資料型別，其中每個分箱在標準正態分佈 N(0, 1) 下的面積相等，並歸一化到 [-1, 1] 範圍內。

更多資訊請閱讀論文：QLoRA: Efficient Finetuning of Quantized LLMs (https://arxiv.org/abs/2305.14314)

NF4 資料型別在 bitsandbytes 中的實現可以在 `functional.py` 檔案中的 `create_normal_map` 函式中找到：https://github.com/TimDettmers/bitsandbytes/blob/main/bitsandbytes/functional.py#L236。

init

< 原始碼 >

( input_features output_features bias = True compute_dtype = None compress_statistics = True quant_storage = torch.uint8 device = None )

引數

input_features (str) — 線性層的輸入特徵數。
output_features (str) — 線性層的輸出特徵數。
bias (bool, 預設為 True) — 線性類是否也使用偏置項。

Params4bit

class bitsandbytes.nn.Params4bit

< 原始碼 >

( data: typing.Optional[torch.Tensor] = None requires_grad = False quant_state: typing.Optional[bitsandbytes.functional.QuantState] = None blocksize: int = 64 compress_statistics: bool = True quant_type: str = 'fp4' quant_storage: dtype = torch.uint8 module: typing.Optional[ForwardRef('Linear4bit')] = None bnb_quantized: bool = False )

init

( *args **kwargs )

初始化 self。檢視 help(type(self)) 獲取準確的簽名。

< > 在 GitHub 上更新

Bitsandbytes

4 位量化

Linear4bit

class bitsandbytes.nn.Linear4bit

__init__

LinearFP4

class bitsandbytes.nn.LinearFP4

__init__

LinearNF4

class bitsandbytes.nn.LinearNF4

__init__

Params4bit

class bitsandbytes.nn.Params4bit

__init__

init

init

init

init