Navigate:
~$LLAMA0.0%

Llama: Inference code for language models

PyTorch inference for Meta's Llama language models.

LIVE RANKINGS • 06:52 AM • STEADY
OVERALL
#176
47
AI & ML
#73
8
30 DAY RANKING TREND
ovr#176
·AI#73
STARS
59.0K
FORKS
9.8K
DOWNLOADS
17
7D STARS
+22
7D FORKS
+4
Tags:
See Repo:
Share:

Learn more about llama

Llama is an inference framework for Meta's open-source language models, ranging from 7 billion to 70 billion parameters. It uses PyTorch and supports distributed inference through torchrun, allowing parallel execution across multiple GPUs. The codebase includes model loading utilities, tokenizer integration, and example scripts for chat completion tasks. The repository has been deprecated in favor of specialized downstream projects that handle model distribution, safety, tooling, and agentic systems separately.

llama

1

Distributed inference support

Uses torchrun for multi-GPU inference with configurable model parallelism, allowing users to adjust nproc_per_node based on model size requirements.

2

Minimal reference implementation

Designed as a lightweight example rather than a comprehensive framework, with basic utilities for model loading and tokenization that can be extended or integrated into other systems.

3

Direct model access

Provides download scripts and integration with Hugging Face for accessing model weights and tokenizers after license approval, with support for multiple model variants.


from llama import Llama

generator = Llama.build(
    ckpt_dir="llama-2-7b/",
    tokenizer_path="tokenizer.model",
    max_seq_len=128,
    max_batch_size=4
)

prompts = ["The future of AI is"]
results = generator.text_completion(prompts, max_gen_len=64, temperature=0.6)
print(results[0]['generation'])


[ EXPLORE MORE ]

Related Repositories

Discover similar tools and frameworks used by developers