Tokenization

Why Trust Techopedia

What Does Tokenization Mean?

Tokenization is the act of breaking up a sequence of strings into pieces such as words, keywords, phrases, symbols and other elements called tokens. Tokens can be individual words, phrases or even whole sentences. In the process of tokenization, some characters like punctuation marks are discarded. The tokens become the input for another process like parsing and text mining.

Advertisements

Tokenization is used in computer science, where it plays a large part in the process of lexical analysis.

In the crypto world, tokenization’s modern roots trace back to blockchain technology and standards like Ethereum’s ERC-20 and ERC-721, which standardized interoperable tokens.

Initially, tokens were mainly utility coins for accessing blockchain services. However, the concept evolved to include security tokens for real-world assets and the most highly-rated NFTs for unique digital items, driven by the need for secure, transparent, and efficient digital asset management and trading.

Techopedia Explains Tokenization

Tokenization relies mostly on simple heuristics in order to separate tokens by following a few steps:

  • Tokens or words are separated by whitespace, punctuation marks or line breaks
  • White space or punctuation marks may or may not be included depending on the need
  • All characters within contiguous strings are part of the token. Tokens can be made up of all alpha characters, alphanumeric characters or numeric characters only.

Tokens themselves can also be separators. For example, in most programming languages, identifiers can be placed together with arithmetic operators without white spaces. Although it seems that this would appear as a single word or token, the grammar of the language actually considers the mathematical operator (a token) as a separator, so even when multiple tokens are bunched up together, they can still be separated via the mathematical operator.

Advertisements

Related Terms

Margaret Rouse
Senior Editor
Margaret Rouse
Senior Editor

Margaret is an award-winning technical writer and teacher known for her ability to explain complex technical subjects to a non-technical business audience. Over the past twenty years, her IT definitions have been published by Que in an encyclopedia of technology terms and cited in articles by the New York Times, Time Magazine, USA Today, ZDNet, PC Magazine, and Discovery Magazine. She joined Techopedia in 2011. Margaret's idea of a fun day is helping IT and business professionals learn to speak each other’s highly specialized languages.