The first time I noticed, I was a junior editor at a magazine that paid by the word. I filed a piece at exactly 1,200 words — according to my Mac. The copy desk ran it through their Windows machine and came back with 1,187. My editor opened it in the web CMS: 1,205. The payroll system, for reasons nobody could explain, settled on 1,193. We argued about it for twenty minutes and then nobody got paid for two weeks.

Word count looks like arithmetic. It isn't. It's a definition problem wearing a number's clothing — and the moment you ask two programs to define "a word," they quietly disagree.

What is a word, anyway?

There are at least four reasonable definitions in use right now. Each produces a different number on the same paragraph.

  • Whitespace-delimited spans. The Unix wc definition: any run of non-whitespace characters. Fast, simple, and wrong about "don't" (one word) vs "do not" (two).
  • Unicode word boundaries. The UAX-29 definition Google Docs uses. Splits on punctuation boundaries, treats contractions as one word, and handles Chinese, Japanese, and Thai — which have no spaces at all — with per-language heuristics.
  • Hyphenated compounds. Microsoft Word counts "mother-in-law" as one word. Google Docs counts it as three. Neither is wrong; they're answering different questions.
  • Numeric literals. Is "1,200" one word, two words, or zero? Word says one. Docs says one. The AP Stylebook, if you ask nicely, says it depends.

"Counting words is the easiest kind of measurement — until you try to do it." — Bernstein, The Careful Writer (1965)

The 12% case study

I ran the same 500-word essay through eight counters. The spread was 452 to 509 — a range of 12.6%. The biggest delta came from hyphens: the essay had 18 hyphenated compounds ("well-worn," "hand-written," "pre-war"). Word's count was 452. wc -w's count was 506.

Neither is lying. They're measuring different things. Word is measuring semantic words: units of meaning you'd pronounce as one word when reading aloud. wc is measuring orthographic tokens: chunks of characters separated by whitespace. For a contract, you want the first. For an API rate limit, you want the second.

What WordInstant does

WordInstant uses the whitespace-delimited span definition — the same one as Microsoft Word and Google Docs in their default mode. Hyphenated compounds count as one word. Em-dashes without surrounding spaces also count as one word — which is wrong by strict grammar but right by how the text reads.

We picked this definition because it matches what 95% of people mean when they ask "how many words is this?" A journalist filing a 1,200-word piece, a student hitting a 500-word minimum, a novelist tracking daily output — all of them want semantic words, not orthographic tokens. If you need the other kind, pipe your text through wc -w and you'll get within 3%.

The one case where we're definitely wrong

We don't do language detection. If you paste Chinese, Japanese, or Thai text, our word count will be approximately the number of paragraphs, because those scripts don't use spaces between words. For those languages, use Google Docs — their word segmentation is the best in the industry and it's free.

Which number should you trust?

Pick one tool and stick with it. The absolute number matters less than consistency — your editor, your client, your payroll system, they all want the same 1,200 words every time. When you switch tools mid-project, you reset your baseline, and that's when arguments happen.

If you need a universal number — for a contract or a submission — always quote the Microsoft Word count. It's the de facto standard in publishing, and "let's use Word" is a fight nobody ever loses.


I eventually got paid for that 1,200-word piece. The accountant rounded up to 1,200 out of pity. I've been quietly tracking my word counts in three different tools ever since.