Serverless SFT

Use Serverless SFT to fine-tune LLMs with supervised learning on curated datasets. Serverless SFT is now in public preview. W&B provisions the training infrastructure (on CoreWeave) for you while allowing full flexibility in your environment’s setup. You get instant access to a managed training cluster that elastically auto-scales to handle your training workloads. Serverless SFT is ideal for tasks like:

Distillation: Transferring knowledge from a larger, more capable model into a smaller, faster one
Teaching output style and format: Training a model to follow specific response formats, tone, or structure
Warmup before RL: Pre-training a model with supervised examples before applying reinforcement learning for further refinement

Serverless SFT trains low-rank adapters (LoRAs) to specialize a model for your specific task. W&B automatically stores the LoRAs you train as artifacts in your account. You can also save them locally or to a third party for backup. W&B Inference also automatically hosts models that you train through Serverless SFT. See the ART Serverless SFT docs to get started.

Why Serverless SFT?

Supervised fine-tuning (SFT) is a training technique where a model learns from curated input-output examples. Serverless SFT on W&B provides the following advantages:

Lower training costs: By multiplexing shared infrastructure across many users, skipping the setup process for each job, and scaling your GPU costs down to 0 when you’re not actively training, Serverless SFT reduces training costs significantly.
Faster training time: By immediately provisioning training infrastructure when you need it, Serverless SFT speeds up your training jobs and lets you iterate faster.
Automatic deployment: Serverless SFT automatically deploys every checkpoint you train, so you do not need to manually set up hosting infrastructure. You can access and test trained models immediately in local, staging, or production environments.

How Serverless SFT uses W&B services

Serverless SFT uses a combination of the following W&B components to operate:

Inference: To run your models
Models: To track performance metrics during the LoRA adapter’s training
Artifacts: To store and version the LoRA adapters
Weave (optional): To gain observability into how the model responds at each step of the training loop

Serverless SFT is in public preview. During the preview, W&B charges you only for inference usage and artifact storage. W&B does not charge for adapter training during the preview period.

Serverless RL

API Reference

Serverless SFT

Why Serverless SFT?

How Serverless SFT uses W&B services

Serverless RL

Serverless SFT

API Reference

Documentation Index

​Why Serverless SFT?

​How Serverless SFT uses W&B services

Why Serverless SFT?

How Serverless SFT uses W&B services