Navigate:
tiktoken
~$TIKTO0.6%

tiktoken: BPE tokenizer for OpenAI models

Fast BPE tokenizer for OpenAI language models.

LIVE RANKINGS • 10:20 AM • STEADY
OVERALL
#132
47
AI & ML
#54
15
30 DAY RANKING TREND
ovr#132
·AI#54
STARS
17.4K
FORKS
1.4K
7D STARS
+102
7D FORKS
+8
Tags:
See Repo:
Share:

Learn more about tiktoken

tiktoken is a tokenization library that implements byte pair encoding (BPE), a compression algorithm that converts text into sequences of numeric tokens. The library is written in Rust with Python bindings, providing both standard encodings for OpenAI models and an extensible architecture for custom tokenizers. It performs lossless, reversible tokenization that works on arbitrary text and compresses input by mapping text to subword units, with tokens typically representing about 4 bytes of text on average. The tool is commonly used in applications that need to count tokens for API billing, prepare text for language models, or implement custom tokenization schemes.

tiktoken

1

Rust-Backed Performance

Written in Rust with Python bindings rather than pure Python, delivering significantly faster tokenization than transformers library implementations. Handles large-scale text processing with minimal overhead for production workloads.

2

Pre-Built Model Encodings

Includes native encodings for OpenAI models (o200k_base, cl100k_base, gpt-4o) with exact token counts for API billing. Educational submodule provides BPE visualization tools for understanding tokenization mechanics.

3

Plugin-Based Extensibility

Supports custom tokenizer encodings through a plugin architecture. Add proprietary model tokenizers or modified encoding schemes without forking the core library, enabling experimentation with novel tokenization approaches.


import tiktoken

encoding = tiktoken.get_encoding("cl100k_base")

text = "Hello, how are you doing today?"
tokens = encoding.encode(text)

print(f"Token count: {len(tokens)}")
print(f"Tokens: {tokens}")


See how people are using tiktoken

Loading tweets...


[ EXPLORE MORE ]

Related Repositories

Discover similar tools and frameworks used by developers