Navigate:
All Reposollama
~$OLLAMA0.1%

Ollama: Run large language models locally

Go-based CLI for local LLM inference and management.

LIVE RANKINGS • 06:51 AM • STEADY
TOP 50TOP 50TOP 50TOP 50TOP 50TOP 50TOP 50TOP 50TOP 50TOP 50TOP 50TOP 50
OVERALL
#44
5
AI & ML
#25
4
30 DAY RANKING TREND
ovr#44
·AI#25
STARS
159.1K
FORKS
14.1K
DOWNLOADS
320.0K
7D STARS
+214
7D FORKS
+28
Tags:
See Repo:
Share:

Learn more about ollama

Ollama is a Go-based command-line application that enables local execution and management of large language models on consumer-grade hardware. The system downloads pre-trained models, converts them to the GGUF quantized format for memory efficiency, and runs a local HTTP inference server that handles model loading and request processing. Users can customize model behavior through declarative configuration files called Modelfiles that specify parameters such as temperature settings, system prompts, and model weights without requiring code modifications. The architecture supports integration with external applications through its REST API, allowing Python and JavaScript clients to communicate with locally-hosted models. This design prioritizes privacy and offline capability by eliminating dependencies on cloud-based inference services while accepting the trade-off of reduced performance compared to distributed GPU clusters.


1

GGUF Format Support

Natively imports quantized GGUF and Safetensors models for efficient inference on consumer hardware. Reduces memory requirements by 4-8x compared to full-precision models while maintaining performance.

2

Modelfile Customization

Define parameters, system prompts, and configurations in declarative Modelfiles to create custom model variants. Build and version modified models locally without altering base weights.

3

Multi-Platform Distribution

Native installers for macOS and Windows, shell scripts for Linux, and official Docker images ensure consistent deployment. Run identical models across development laptops, servers, and containerized environments.


import requests

response = requests.post('http://localhost:11434/api/generate', json={
    'model': 'llama2',
    'prompt': 'Why is the sky blue?',
    'stream': False
})

print(response.json()['response'])

vv0.12.11-rc1

Adds logprobs support to both Ollama and OpenAI-compatible APIs, fixes tool calling bugs, and enables opt-in Vulkan rendering.

  • Set OLLAMA_VULKAN=1 to enable Vulkan rendering; Ollama now prefers dedicated GPUs over integrated GPUs for scheduling.
  • Tool definitions now correctly omit 'required' field when unspecified and fix missing 'tool_call_id' in OpenAI API responses.
vv0.12.10

Adds embedding model support to `ollama run` and fixes critical hangs from CPU discovery; no breaking changes reported.

  • Run embedding models directly with `ollama run embeddinggemma "text"` or pipe input via stdin for vector generation.
  • Update if you hit CPU discovery hangs or need tool call IDs from `/api/chat`; fixes qwen3-vl:235b errors and stale VRAM reads.
vv0.12.9

Fixes a performance regression affecting CPU-only systems introduced in v0.12.8.

  • Upgrade if running on CPU-only hardware to restore prior inference performance.
  • Release notes do not specify the root cause or affected workloads beyond CPU systems.

See how people are using ollama

Loading tweets...


[ EXPLORE MORE ]

Related Repositories

Discover similar tools and frameworks used by developers