class ITextGranulator(Interface):
"""Provides methods to granulate a document into
chapters and paragraphs"""
def getParagraphItemList(file):
"""Returns the list of paragraph in the form
of (id, class) where class may have special
meaning to define TOC / TOI
"""
def getParagraphItem(file, paragraph_id):
"""Returns the paragraph in the form
of (text, class)
"""
def getChapterItemList(file):
"""Returns the list of chapters in the form
of (id, level)
"""
def getChapterItem(file, chapter_id):
"""Returns the list of chapters in the form
of (title, level)
"""
Paragraphs can be extracted from a text document. Each paragraph is identified by an ID, which possibly should also exist as an HTML anchor in the HTML conversion of the document. Each paragraph has a class, which relates either to a CSS class or to a specific item such as the “Table Of Contents” (TOC) or the “Table of Images” (TOI). Paragraphs which play the role of chapters, sections and subsections can be listed.