First page Back Continue Last page Image

IGranulator (follow-up)

class ITextGranulator(Interface):

"""Provides methods to granulate a document into

chapters and paragraphs"""

def getParagraphItemList(file):

"""Returns the list of paragraph in the form

of (id, class) where class may have special

meaning to define TOC / TOI

"""

def getParagraphItem(file, paragraph_id):

"""Returns the paragraph in the form

of (text, class)

"""

def getChapterItemList(file):

"""Returns the list of chapters in the form

of (id, level)

"""

def getChapterItem(file, chapter_id):

"""Returns the list of chapters in the form

of (title, level)

"""

Notes:

Paragraphs can be extracted from a text document. Each paragraph is identified by an ID, which possibly should also exist as an HTML anchor in the HTML conversion of the document. Each paragraph has a class, which relates either to a CSS class or to a specific item such as the “Table Of Contents” (TOC) or the “Table of Images” (TOI). Paragraphs which play the role of chapters, sections and subsections can be listed.