|
|||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||||
java.lang.Objectgate.util.AbstractFeatureBearer
gate.creole.AbstractResource
gate.creole.AbstractLanguageResource
gate.DocumentFormat
gate.corpora.TextualDocumentFormat
The format of Documents. Subclasses of DocumentFormat know about particular MIME types and how to unpack the information in any markup or formatting they contain into GATE annotations. Each MIME type has its own subclass of DocumentFormat, e.g. XmlDocumentFormat, RtfDocumentFormat, MpegDocumentFormat. These classes register themselves with a static index residing here when they are constructed. Static getDocumentFormat methods can then be used to get the appropriate format class for a particular document.
| Field Summary | |
private static boolean |
DEBUG
Debug flag |
| Fields inherited from class gate.DocumentFormat |
element2StringMap, isGateXmlDocument, magic2mimeTypeMap, markupElementsMap, mimeString2ClassHandlerMap, mimeString2mimeTypeMap, suffixes2mimeTypeMap |
| Fields inherited from class gate.creole.AbstractLanguageResource |
dataStore, lrPersistentId |
| Fields inherited from class gate.creole.AbstractResource |
name |
| Constructor Summary | |
TextualDocumentFormat()
Default construction |
|
| Method Summary | |
void |
annotateParagraphs(Document aDoc,
int startOffset,
int endOffset,
String annotSetName)
This method annotates paragraphs in a GATE document. |
DataStore |
getDataStore()
Get the data store that this LR lives in. |
Resource |
init()
Initialise this resource, and return it. |
private void |
removeExtraNewLine(Document doc)
Delete '\r' in combination CRLF or LFCR in document content |
protected void |
setNewLineProperty(Document doc)
Check the new line sequence and set document property. |
void |
unpackMarkup(Document doc)
Unpack the markup in the document. |
void |
unpackMarkup(Document doc,
RepositioningInfo repInfo,
RepositioningInfo ampCodingInfo)
|
| Methods inherited from class gate.creole.AbstractLanguageResource |
cleanup, getLRPersistenceId, getParent, isModified, setDataStore, setLRPersistenceId, setParent, sync |
| Methods inherited from class gate.creole.AbstractResource |
checkParameterValues, getBeanInfo, getName, getParameterValue, getParameterValue, removeResourceListeners, setName, setParameterValue, setParameterValue, setParameterValues, setParameterValues, setResourceListeners |
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Methods inherited from interface gate.LanguageResource |
getLRPersistenceId, getParent, isModified, setDataStore, setLRPersistenceId, setParent, sync |
| Methods inherited from interface gate.Resource |
cleanup, getParameterValue, setParameterValue, setParameterValues |
| Methods inherited from interface gate.util.NameBearer |
getName, setName |
| Field Detail |
private static final boolean DEBUG
| Constructor Detail |
public TextualDocumentFormat()
| Method Detail |
public Resource init()
throws ResourceInstantiationException
init in interface Resourceinit in class AbstractResourceResourceInstantiationException
public void unpackMarkup(Document doc)
throws DocumentFormatException
unpackMarkup in class DocumentFormatDocumentFormatException
public void unpackMarkup(Document doc,
RepositioningInfo repInfo,
RepositioningInfo ampCodingInfo)
throws DocumentFormatException
unpackMarkup in class DocumentFormatDocumentFormatExceptionprotected void setNewLineProperty(Document doc)
private void removeExtraNewLine(Document doc)
public void annotateParagraphs(Document aDoc,
int startOffset,
int endOffset,
String annotSetName)
throws DocumentFormatException
aDoc - is the gate document on which the paragraph detection would
be performed.If it is null or its content it's null then the method woul
simply return doing nothing.startOffset - is the index form the document content from which the
paragraph detection will startendOffset - is the offset where the detection will end.annotSetName - is the name of the set in which paragraph annotation
would be created.The annotation type created will be "paragraph"
DocumentFormatExceptionpublic DataStore getDataStore()
LanguageResource
getDataStore in interface LanguageResourcegetDataStore in class AbstractLanguageResource
|
|||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||||