The science of word recognition
The legibility of a typeface should not be evaluated on its ability to generate a good word shape.
This presentation, made at the 2003 ATypI conference in Vancouver, provoked much interest and debate.
Evidence from the last twenty years of work in cognitive psychology indicates that we use the letters within a word to recognise the word. Many typographers and text enthusiasts insist that words are recognised by the outline made around the word shape. Some have used the term ‘bouma’ as a synonym for word shape. (The term bouma, which comes from papers written in the 1970s by H. Bouma, appears in Paul Saenger’s 1997 book Space Between Words: The Origins of Silent Reading.)
My paper is written from perspective of a reading psychologist. The data from dozens of experiments all come from peer-reviewed journals where the experiments are well specified, so that anyone can reproduce the experiment and expect to achieve the same result. My goal is to review the history of why psychologists moved from a word shape model of word recognition to a letter recognition model, and to help others to come to the same conclusion.
I will start by describing three major categories of word recognition models: the word shape model, the serial model, and the parallel model of letter recognition. I will present data that was used as evidence to support each model, and will evaluate the models in terms of their ability to support the data. Finally I will describe some recent developments in word recognition and a more detailed model that is currently popular among psychologists.
Model no. 1. Word shape
The word recognition model, which says that words are recognised as complete units, is the oldest model in the psychological literature. The idea is that we see words as a complete patterns, rather than the sum of letter parts.
In 1886, James Cattell was the first psychologist to propose the word shape model of word recognition. He presented letter and word stimuli to study participants for a very brief period of time (five to ten milliseconds), and found that people were more accurate at recognising the words than the letters. This finding is now called the Word Superiority Effect. Cattell concluded that we use whole words for word recognition because of their advantage over individual letters.
The second piece of experimental data that supports the word shape model is that lowercase text is read faster than uppercase text. On average people read lowercase text five to ten per cent faster than uppercase text. This supports the word shape model because lowercase text has unique patterns of ascending, descending and neutral characters, while uppercase text has less variance in text size and shape.
The patterns of errors that are missed while proof-reading text provides the third key piece of experimental evidence to support the word shape model. Study participants were asked to read a passage of text for comprehension and to mark any misspelling they found. The passage had some misspellings that were consistent with word shape, and some that were inconsistent with the intended word’s shape. Haber and Schindler (1981) found that misspellings consistent with word shape were twice as likely to be missed as misspellings inconsistent with word shape. The word shape model predicted that consistent word shape misspellings would be caught less often, because of their similarity with the intended word.
Model no. 2. Serial letter recognition
The serial letter recognition model says that recognising a word is analogous to looking up a word in a dictionary. You start off by finding the first letter, then the second, and so on until you recognise the word. This model is appealing because it is easier to test than the word shape model. However it was quickly discarded because it couldn’t explain the Word Superiority Effect.
Evidence for the serial letter recognition model comes from letter recognition speed and the word length effect. Letters can be recognised one at a time at rate of ten to twenty milliseconds per letter, which is consistent with a typical reading rate of 300 words per minute (wpm). The serial letter recognition model also correctly predicts that words with fewer letters are recognised more quickly than words with many letters, while the word shape model expects longer words with more unique patterns to be easier to recognise.
Model no. 3. Parallel letter recognition
The model that most psychologists currently accept as the most accurate is the parallel letter recognition model. This model says that the letters within a word are recognised simultaneously, and the letter information is used to recognise the word. This is an active area of research and there are many specific models that fit into this general category of model.
Much of the evidence for the parallel letter recognition model comes from the eye movement literature. With the advent of fast eye trackers and computers, a great deal has now been learned about how we read. We now have the ability to make changes to text in real time while people read, which has provided useful insights into reading process that weren’t previously possible.
It has been known for more than 100 years that when we read our eyes don’t move smoothly across the page. Rather, they make discrete jumps from word to word. We fixate on a word for a period of time, roughly 200-250 milliseconds, then make a ballistic movement to another word. These movements are called saccades and usually take 20-35 milliseconds. Most saccades are forward movements from seven to nine letters, but ten to fifteen per cent of all saccades are regressive or backwards movements. Most readers are completely unaware of the frequency of regressive saccades while reading.
The location of the fixation is not random. Fixations never occur between words, and usually occur just to the left of the middle of a word. Not all words are fixated; short words, and particularly function words, are frequently skipped.
During a single fixation there is a limit to the amount of information that can be recognised. The fovea, the clear centre point of our vision, can only see three to four letters to the left and right of fixation at normal reading distances. Visual acuity decreases quickly in the parafovea, which extends out as far as fifteen to twenty letters to the left and right of the fixation point.
Eye movement studies indicate that there are three zones of visual identification. Readers collect information from all three zones during the span of a fixation. Closest to the fixation point is where word recognition takes place. This zone is usually large enough to capture the word being fixated and often includes smaller function words directly to the right of the fixated word. The next zone extends a few letters past the word recognition zone, and readers gather preliminary information about the next letters in this zone. The final zone extends out to fifteen letters past the fixation point. Information gathered out this far is used to identify the length of upcoming words and to identify the best location for the next fixation point.
There are two experimental methodologies that have been critical for understanding the fixation span: the moving window paradigm and the boundary study paradigm. These methodologies make it possible to study readers while they’re engaged in ordinary reading. Both rely on fast eye trackers and computers to perform text manipulations while a reader is making a saccade. While making a saccade, the reader is functionally blind. He or she will not perceive that text has changed, if the change is completed before the saccade has finished.
Moving window study
In the moving window technique we restrict the amount of text that is visible to a certain number of letters around the fixation point, and replace all of the other letters on a page with the letter ‘x’. The reader’s task is simply to read the page of text.
McConkie and Rayner (1975) examined how many letters around the fixation point are needed to provide a normal reading experience.
From our studies we learned that our perceptual span is roughly fifteen letters. This is interesting, as the average saccade length is seven to nine letters, or roughly half our perceptual span. This indicates that, while readers are recognising words closer to the fovea, we’re using additional information further out to guide our reading.
The moving window study demonstrates the importance of letters in reading, but it is not airtight. The word shape model of reading would also expect that reading speed would decrease as word shape information disappears. The word shape model would make the additional prediction that reading would be significantly improved if information on the whole word shape were always retained. This turns out to be false.
A further study showed that the reading rate when three letters are available is roughly equivalent to the reading rate when the fixated word is entirely there. That is true even though the entire word has 0.7 more letters available on average. When the fixated word and the following word are entirely available, reading rate is equivalent to when nine letters are available. Reading rate is also equivalent when three words or fifteen letters are available.
This means that reading is not necessarily faster when entire subsequent words are available; similar reading speeds can be found when only a few letters are available.
The boundary study (Rayner, McConkie and Zola, 1980) is another innovative paradigm that eye trackers and computers made possible. With this we can examine what information the reader is using inside the perceptual span (fifteen letters), but outside of the word that is being fixated.
The critical word in this study is presented in different conditions, including an identical control condition (‘chart’), a dissimilar word shape with some letters in common (‘chyft’), and similar word shape with no letters in common (‘ebovf’). Readers were faster when some letters were in common than when the word shape was similar. This demonstrates that letter information is being collected within the fixation span, even when the entire word is not being recognised.
Rayner, McConkie and Zola (1980) further investigated what happens with a capitalised form of the critical word CHART. This eliminates the role of word shape, but retains perfect letter information. They found that the fixation times were the same as the control condition. This demonstrates that it isn’t visual information about either word shape or letter shape that is being retained from saccade to saccade, but rather abstracted information about which letters are coming up.
Evidence for word shape revisited
What about the evidence that supported the word shape model? Does any of it disprove the parallel letter recognition model?
The strongest evidence for the word shape model was the Word Superiority Effect, which showed that letters can be more accurately recognised in the context of a word than in isolation. This was a logical claim until McClelland and Johnson (1977) demonstrated that pseudowords also show a Word Superiority Effect. For example, pseudowords such as ‘mave’ and ‘rint’ are not words in the English language and should not have a familiar word shape, but they do have the phonetic regularity that makes them easily pronounceable. Therefore, the reason for the Word Superiority Effect isn’t the recognition of word shapes, but rather the existence of regular letter combinations.
The weakest evidence in support of word shape is that lowercase text is read faster than uppercase text. This is entirely a practise effect. Most readers spend the bulk of their time reading lowercase text, and are therefore more proficient at it. When readers are forced to read large quantities of uppercase text, their reading speed will increase to the rate of lowercase text.
Earlier I reported that readers were twice as likely to fail to notice a misspelling that is consistent with word shape. Unfortunately this study confounded word and letter shape, comparing misspellings with similar letter and word shape to misspellings with different letter and word shape. Paap, Newsome and Noel (1984) determined the relative contribution of word shape and letter shape, and found that the entire effect is driven by letter shape. There are many more errors when the replaced letter has the same basic shape (replacing ‘n’ for ‘h’) than when the replaced letter retains the same pattern of ascenders and descenders (replacing ‘d’ for ‘h’).
Further examination of the evidence used to support the word shape model has demonstrated that the case for the word shape model was not as strong as it seemed. In the next section I will describe an active area of research: using computers to model parallel letter recognition.
Computer models of parallel letter recognition
The human brain is made up of billions of neurons, each with 4000 synapses on average. Each of these neurons is a summation machine that adds (and subtracts) input from other neurons. We can use computer simulations to create models that have biological plausibility which can explain complex behaviours with simple mechanisms.
The first well known computer simulation of reading was McClelland and Rumelhart’s Interactive Activation model (1981).
The most important benefit of neural network modelling is that it is specific enough to be programmed into a computer and tested. The Interactive Activation model is able to explain many human behaviours that it was not specifically designed for.
There has been great progress in developing neural network models of reading that can account for more human reading behaviours (see Plaut, McClelland, Seidenberg and Patterson, 1996). Neural network models are now able to account for how reading develops and generate correct word pronunciations without the use of specific word nodes.
Given that all the reading research psychologists I know support a version of the parallel letter recognition model of reading, how is it that all the typographers I know say we read by matching whole word shapes?
It appears to be a grand misunderstanding. The paper by Bouma that is most frequently cited does not support a word shape model of reading. Bouma (1973) presented words and unpronounceable letter strings to subjects away from the fixation point and measured their ability to name the first and last letters. He found that:
A) Subjects are more successful at naming letters to the right of fixation than to the left of fixation.
B) When distance to the right of the fixation point is controlled, subjects are better able to recognise the last letter of a word than the first letter of word. This is why we tend to fixate just to the left of the middle of a word.
Bouwhuis and Bouma (1979) extended the Bouma (1973) paper by not only finding the probability of recognising the first and last letters of a word, but also the middle letters. They used this data to develop a model of word recognition based on the probability of recognising each of the letters within a word. They conclude that ‘word shape … might be satisfactorily described in terms of the letters in their positions.’ This model of word recognition clearly influenced the McClelland and Rumelhart neural network model discussed earlier, which also used letters in their positions to probabilistically recognise words.
Word shape is no longer a viable model of word recognition. The bulk of scientific evidence says that we recognise a word’s component letters, then use that visual information to recognise a word. In addition to perceptual information, we also use contextual information to help recognise words during ordinary reading, but that has no bearing on the word shape versus parallel letter recognition debate.
I hope that it is clear that the readability and legibility of a typeface should not be evaluated on its ability to generate a good word shape.