O narzędziu
Axolotl — open-source post-training i fine-tuning framework. v0.8.x — production-ready. Single YAML config dla całego pipeline (preprocessing, training, evaluation, quantization, inference). Najnowsze modele: Qwen3 Next, Qwen2.5-VL, Qwen3-VL, Granite 4, HunYuan, Magistral 2509, Apertus, Seed-OSS. Quantization-aware training (QAT), sequence parallelism (long-context), GRPO (reasoning), RLHF reward modeling.
Funkcje 2026 (v0.8.x)
- •Quantization-aware training (QAT).
- •Sequence parallelism dla long-context models.
- •GRPO dla reasoning training.
- •Full reward modeling support dla RLHF pipelines.
- •Production-ready at scale.
Funkcje dodatkowe
▶Najnowsze Models 2026
Qwen3 Next, Qwen3, Qwen3MoE, Qwen2.5-VL, Qwen3-VL, Granite 4, HunYuan, Magistral 2509, Apertus, Seed-OSS. Day-0 support dla wszystkich frontier open-source releases w 2026.
▶Multimodal VLMs
Vision Language Models: LLaMA-Vision, Pixtral, LLaVA, SmolVLM2, GLM-4.6V, InternVL 3.5, Gemma 3n. Pelne wsparcie dla fine-tuningu multimodalnych modeli.
▶Audio (Voxtral)
Wsparcie dla audio language models — Voxtral. Pozwala fine-tunowac modele audio-to-text i text-to-audio na wlasnych zbiorach danych dla speech applications.
▶Training Methods (Full FT/LoRA/QLoRA/GPTQ/QAT)
Pelne spektrum: Full fine-tuning, LoRA, QLoRA, GPTQ (post-training quantization), QAT (Quantization-aware training). Najszersza paleta technik w open-source ecosystem.
▶Preference Tuning (DPO/IPO/KTO/ORPO)
DPO, IPO (Identity Preference Optimization), KTO, ORPO — wszystkie modern preference tuning techniques. Konkurencyjna jakosc do klasycznego RLHF przy znacznie prostszej implementacji.
▶GRPO reasoning + GDPO
GRPO (Group Relative Policy Optimization) dla reasoning training, GDPO (Group DPO) — najnowsze techniki RL dla LLM. Krytyczne dla treningu modeli typu o1, DeepSeek R1.
▶Reward Modelling (RM/PRM)
Reward Modelling (RM) i Process Reward Modelling (PRM) — full support dla treningu reward models. PRM ocenia kazdy krok reasoningu, kluczowe dla step-by-step problem solving.
▶QAT (NEW v0.8.x)
Quantization-aware training — model uczy sie podczas treningu, jak bedzie zachowywal sie po quantization. Lepsze wyniki niz post-training quantization dla low-bit deployments.
▶Sequence Parallelism (NEW)
Sequence parallelism dla long-context models — distributes sequence dimension across GPUs. Pozwala trenowac na bardzo dlugich kontekstach (100K+ tokenow) bez OOM errors.
▶YAML Single Config
Pelny pipeline w jednym YAML config: dataset preprocessing → training → evaluation → quantization → inference. Declare intent zamiast pisania Python scripts — reproducible i shareable.
✓ Zalety
Wspierane modele (2026)
- •Najnowsze: Qwen3 Next, Qwen3, Qwen3MoE, Qwen2.5-VL, Qwen3-VL.
- •Granite 4, HunYuan, Magistral 2509, Apertus, Seed-OSS.
- •Multimodal VLMs: LLaMA-Vision, Pixtral, LLaVA, SmolVLM2, GLM-4.6V, InternVL 3.5, Gemma 3n.
- •Audio: Voxtral.
Cennik
- •Open-source.
- •$0.
- •GitHub: axolotl-ai-cloud/axolotl.
API i integracje
- •Python (pip install axolotl).
- •HuggingFace ecosystem orchestration.
- •PEFT, transformers, accelerate native.
- •Multi-GPU + DeepSpeed support.
Training methods
- •Full fine-tuning.
- •LoRA, QLoRA, GPTQ, QAT (quantization-aware training).
- •Preference Tuning: DPO, IPO, KTO, ORPO.
- •RL: GRPO (reasoning), GDPO.
- •Reward Modelling (RM) / Process Reward Modelling (PRM).
YAML configuration
- •Single YAML config dla full pipeline: dataset preprocessing → training → evaluation → quantization → inference.
- •Declare intent zamiast pisania Python training scripts.
- •Re-use config across full fine-tuning pipeline.
