Overview

Small Language Models (SLMs) are relatively lightweight language models designed to run efficiently in resource-constrained environments (e.g., laptops, smartphones, embedded/edge devices, or low-power servers).

Compared to frontier Large Language Models (LLMs), SLMs typically use fewer parameters (commonly from ~1B up to ~10B) and smaller runtime footprints, while still supporting core NLP capabilities such as text generation, summarization, translation, and question answering.

Terminology

Some practitioners dislike the term “Small Language Model” because a billion parameters is not “small” in an absolute sense. Alternatives like “small LLM” exist, but “SLM” is widely used in practice.

Why Use SLMs?

  • Lower inference cost (memory, compute, power)
  • Lower latency (especially on-device)
  • Easier deployment in constrained environments
  • Potentially improved privacy when running locally (depending on the application)

Trade-offs

  • Lower ceiling on capability compared to larger models (reasoning, long-context, tool use)
  • More sensitivity to data quality and training recipe
  • Smaller models may require tighter prompting, finetuning, or retrieval to match task requirements

Examples

Examples of commonly cited SLMs (~1–4B parameters):

  • Llama 3.2 1B (Meta)
  • Qwen 2.5 1.5B (Alibaba)
  • DeepSeek-R1 1.5B (DeepSeek; distilled from Qwen 2.5)
  • SmolLM2 1.7B (HuggingFaceTB)
  • Phi-3.5 Mini 3.8B (Microsoft)
  • Gemma (e.g., 4B-class variants)
  • minimind

Other small but strong models sometimes mentioned include Mistral 7B, Gemma 9B, and Phi-4 14B (depending on your definition of “small”).

References