Tokenization Demo

This interactive demo showcases the process of tokenization, a fundamental technique used in natural language processing (NLP) and generative AI.

Enter any text into the input field below...

As you type, your sentence is split into words, the way us humans tend to see and read them:


But how does a machine see them? Click the button below to tokenize your text, which will convert your words into token IDs for a given vocabulary.

These are the token IDs that the tiktoken library assigned to your words. This is closer to how ChatGPT and other LLMs see your text when you write a prompt in natural language: