Navigate:
Ollama
~$OLLAM0.5%

Ollama: Run large language models locally

Go-based CLI for local LLM inference and management.

LIVE RANKINGS • 11:36 AM • STEADY
TOP 100TOP 100TOP 100TOP 100TOP 100TOP 100TOP 100TOP 100TOP 100TOP 100TOP 100TOP 100
OVERALL
#62
24
AI & ML
#32
8
30 DAY RANKING TREND
ovr#62
·AI#32
STARS
163.5K
FORKS
14.7K
7D STARS
+809
7D FORKS
+91
Tags:
See Repo:
Share:

Learn more about Ollama

Ollama is a Go-based command-line application that enables local execution and management of large language models on consumer-grade hardware. The system downloads pre-trained models, converts them to the GGUF quantized format for memory efficiency, and runs a local HTTP inference server that handles model loading and request processing. Users can customize model behavior through declarative configuration files called Modelfiles that specify parameters such as temperature settings, system prompts, and model weights without requiring code modifications. The architecture supports integration with external applications through its REST API, allowing Python and JavaScript clients to communicate with locally-hosted models. This design prioritizes privacy and offline capability by eliminating dependencies on cloud-based inference services while accepting the trade-off of reduced performance compared to distributed GPU clusters.

Ollama

1

GGUF Format Support

Natively imports quantized GGUF and Safetensors models for efficient inference on consumer hardware. Reduces memory requirements by 4-8x compared to full-precision models while maintaining performance.

2

Modelfile Customization

Define parameters, system prompts, and configurations in declarative Modelfiles to create custom model variants. Build and version modified models locally without altering base weights.

3

Multi-Platform Distribution

Native installers for macOS and Windows, shell scripts for Linux, and official Docker images ensure consistent deployment. Run identical models across development laptops, servers, and containerized environments.


import requests

response = requests.post('http://localhost:11434/api/generate', json={
    'model': 'llama2',
    'prompt': 'Why is the sky blue?',
    'stream': False
})

print(response.json()['response'])

vv0.15.6

Bug fixes for ollama launch command including context limits, missing model downloads, and image handling.

  • Fixed context limits when running ollama launch droid
  • ollama launch will now download missing models instead of erroring
  • Fixed bug where ollama launch claude would cause context compaction when providing images
vv0.15.5

New models Qwen3-Coder-Next and GLM-OCR, enhanced ollama launch with arguments and subagent support.

  • Qwen3-Coder-Next: a coding-focused language model from Alibaba's Qwen team, optimized for agentic coding workflows and local development
  • GLM-OCR: GLM-OCR is a multimodal OCR model for complex document understanding, built on the GLM-V encoder–decoder architecture
  • ollama launch can now be provided arguments, for example ollama launch claude -- --resume
  • ollama launch will now work run subagents when using ollama launch claude
  • Ollama will now set context limits for a set of models when using ollama launch opencode
vv0.15.4

Improved OpenClaw integration with automatic onboarding flow when launching for the first time.

  • ollama launch openclaw will now enter the standard OpenClaw onboarding flow if this has not yet been completed

See how people are using Ollama

Loading tweets...


[ EXPLORE MORE ]

Related Repositories

Discover similar tools and frameworks used by developers