Navigate:
All Reposllama.cpp
~$LLAMAC0.2%

llama.cpp: LLM inference in C/C++

Quantized LLM inference with hardware-accelerated CPU/GPU backends.

LIVE RANKINGS • 06:51 AM • STEADY
TOP 25TOP 25TOP 25TOP 25TOP 25TOP 25TOP 25TOP 25TOP 25TOP 25TOP 25TOP 25
OVERALL
#24
9
AI & ML
#14
8
30 DAY RANKING TREND
ovr#24
·AI#14
STARS
92.7K
FORKS
14.4K
DOWNLOADS
7D STARS
+190
7D FORKS
+26
Tags:
See Repo:
Share:

Learn more about llama.cpp

llama.cpp is a C/C++ library and command-line tool for executing large language model inference without external dependencies. It implements quantization support ranging from 1.5-bit to 8-bit integer formats and includes hardware-specific optimizations via ARM NEON, Accelerate, Metal, AVX/AVX2/AVX512, CUDA, HIP, Vulkan, and SYCL backends. The project supports numerous model architectures including LLaMA variants, Mistral, Mixtral, Falcon, and others, with capabilities for both CPU and GPU acceleration as well as hybrid inference modes. Common deployment scenarios include local inference on consumer hardware, cloud-based inference services, and integration into applications requiring on-device language model execution.


1

Zero External Dependencies

Pure C/C++ implementation requires no external libraries for compilation or runtime. Simplifies deployment across embedded systems, servers, and consumer devices without dependency management.

2

Multi-Backend Hardware Acceleration

Single codebase supports ARM NEON, Metal, AVX/AVX2/AVX512, CUDA, HIP, Vulkan, and SYCL backends. Automatically leverages available hardware acceleration without code changes across CPU, GPU, and specialized accelerators.

3

Flexible Quantization Formats

Supports 1.5-bit to 8-bit integer quantization with runtime format selection. Enables engineers to balance model size, memory footprint, and inference speed based on target hardware constraints.


./main -m models/llama-2-7b.Q4_K_M.gguf -p "Hello, my name is" -n 128

# Load a quantized model and generate 128 tokens
# -m: model path
# -p: prompt text
# -n: number of tokens to generate

vb7053

Release notes do not specify breaking changes, requirements, or new capabilities.

  • No actionable migration steps or configuration changes are documented for this release.
  • Consult the commit history or changelog for details before upgrading production systems.
vb7052

Release notes do not specify breaking changes, requirements, or new capabilities.

  • No migration steps or configuration changes are documented for this release.
  • Evaluate commit history directly to assess impact before upgrading production systems.
vb7051

Release notes do not specify breaking changes, requirements, or new capabilities.

  • No actionable migration steps or deprecations are documented in the provided release notes.
  • Verify compatibility and test thoroughly before upgrading production systems.

See how people are using llama.cpp

Loading tweets...


[ EXPLORE MORE ]

Related Repositories

Discover similar tools and frameworks used by developers