RWKV Tokenizer Visualization

The RWKV tokenizer is a subword-level tokenizer used across RWKV language models to convert text to integer token IDs and back. It segments text into frequently occurring sub-words or characters, mapping them to a finite vocabulary of around 50,000 tokens (in case of RWKV World Tokenizer used in RWKV 5 and newer).

This web application visualise the tokenization using the RWKV World Tokenizer. It runs entirely in the web browser thanks to the WebAssembly version of the original Rust RWKV Tokeniser.