A formal model for databases of structured text

B. Lowe, J. Zobel, and R. Sacks-Davis

Documents have a natural hierarchical structure, implicit in most texts and made explicit by markup languages such as SGML. In this paper we propose a formal model for representation of hierarchically structured documents, to be used as the basis for document query languages. The model uses a redundant representation of the document elements to simplify the expression of common queries. As an illustration of the power of the model we show how queries might be expressed, and outline how such queries might be evaluated in a practical system.