Using the Knuth-Plass algorithm to help control widow and orphan lines

Robert Rundell
Seattle, WA
Play (32min) Download: MP4 | MP3

(Due to operator error, the speaker video for this talk was not recorded — apologies.)

The Knuth-Plass line-breaking algorithm is one of the many exceptional features of TEX, taking a paragraph of text and converting it to a vertical list of well-proportioned lines. Through glue and penalty markers TEX gives the user almost complete control over the spacing and look of the paragraph.

However, in some instances TEX does not provide the user quite as rich a set of options to control the vertical list as in other areas. In particular, eliminating widow and orphan lines can require inserting forced break points into the text, break points that can only be found from previous passes of TEX. Subsequent changes to the document can require changes to some or all of these manually inserted line or page breaks.

In AML, an experimental typesetting program under development, the Knuth-Plass algorithm is enhanced to find not only the optimal line-break points for a paragraph, but also to give alternate mappings of the paragraph into different numbers of lines (where possible). AML stores these different sets of break points and uses this information, along with natural page break points, to automatically eliminate widow and orphan lines in many cases. Once a bad page break point is detected, AML will backtrack and adjust previous paragraphs to create better page breaks. With far greater memory and processing capabilities than were available at TEX’s creation, multiple pages can be examined and processed before a final page break needs to be finalized, allowing the overall document layout to be improved. The combination of keeping multiple pages and also keeping alternative paragraph linebreaking sets in memory allows AML to automate and improve this aspect of document typesetting.