What is the Unsloth Dataset Builder?
Fine-tuning a language model with Unsloth starts with a clean JSONL dataset. This builder helps you create one without writing scripts: pick an export format (Alpaca instruction rows, chat messages, ShareGPT conversations, OpenAI-style tool-calling rows with messages and tools schemas, input/output pairs, or raw text for continued pretraining), add rows manually or import CSV/JSON/JSONL, validate each row against the target schema, split train and eval sets, and download ready-to-use files. Imports are parsed in batches so large files stay responsive, and exports are built in chunks to avoid memory spikes. Everything runs in your browser, so proprietary training data never uploads to a server.
How to use the Unsloth Dataset Builder
- Choose your Unsloth export format (Alpaca, messages, conversations, tool calls, input/output, or raw text).
- Add rows manually, paste JSONL, or upload a CSV/JSON/JSONL file.
- For tool-calling SFT, use the tool calls format and include assistant tool_calls plus tool role responses.
- Review validation stats and fix any invalid rows in the paginated editor.
- Set the train/eval split ratio if you want a separate eval.jsonl.
- Download train.jsonl (and eval.jsonl), then use the copied Python snippet with Unsloth.
Common use cases
- Building an Alpaca-style SFT dataset from a CSV of prompts and answers
- Converting existing JSONL into Unsloth-ready chat messages with a system prompt
- Creating function-calling training data with tool_calls, tool responses, and tools schemas
- Splitting a dataset into train.jsonl and eval.jsonl before running Unsloth
Frequently asked questions
- What JSONL format does Unsloth expect?
- For supervised fine-tuning, common shapes are Alpaca ({instruction, input?, output}), chat ({messages: [{role, content}]}), ShareGPT ({conversations}), or tool-calling ({messages with tool_calls and tool roles, tools schema array}). For continued pretraining, use {text}. This tool exports any of these.
- Is there a limit on dataset size?
- No fixed row cap. Very large files are limited only by your device's available memory. Imports and exports use chunked processing to stay responsive.
- Is my training data uploaded anywhere?
- No. Parsing, validation, and export all happen locally in your browser.