Bag of Words (BoW)

Why Trust Techopedia

What Does Bag of Words Mean?

Bag of Words (BoW) is a natural language processing (NLP) strategy for converting a text document into numbers that can be used by a computer program. BoW is often implemented as a Python dictionary. Each key in the dictionary is set to a word, and each value is set to the number of times the word appears.


The BoW model is one of the most useful ways to convert text data for use by machine learning algorithms. In this context, text words are referred to as tokens and the entire process of representing a sentence as a bag of words vector (a string of numbers) is known as tokenization.

Techopedia Explains Bag of Words

BoW models are concerned with whether a known word occurs in a document and how many times it occurs — not the order in which it appears, nor its context. BoW plays an important role in natural language processing, information retrieval from documents and document classification.

How Bag of Words Works

BoW is used to extract feature sets from text during the data pre-processing phase. The strategy involves breaking a document down into a list of disparate words and noting how many times each word is used in the document.

The name ‘Bag of Words’ is thought to have been inspired by the popular word game, Scrabble. The value of each tile in a Scrabble bag was determined by how frequently a specific letter appeared on the front page of the New York Times in 1938.


Related Terms

Margaret Rouse

Margaret jest nagradzaną technical writerką, nauczycielką i wykładowczynią. Jest znana z tego, że potrafi w prostych słowach pzybliżyć złożone pojęcia techniczne słuchaczom ze świata biznesu. Od dwudziestu lat jej definicje pojęć z dziedziny IT są publikowane przez Que w encyklopedii terminów technologicznych, a także cytowane w artykułach ukazujących się w New York Times, w magazynie Time, USA Today, ZDNet, a także w magazynach PC i Discovery. Margaret dołączyła do zespołu Techopedii w roku 2011. Margaret lubi pomagać znaleźć wspólny język specjalistom ze świata biznesu i IT. W swojej pracy, jak sama mówi, buduje mosty między tymi dwiema domenami, w ten…