comet.html¶
The html module provides utility functions for working with input and output HTML text.
Methods¶
- comet.html.getRawText(input)¶
Converts an HTML input string to raw text that no longer contains formatting tags.
All HTML entities and tags are removed. The input must be valid HTML.
- Parameters:
input (str) – The input HTML text
- Returns:
The raw text content
- Return type:
- Raises:
TypeError – When parameter types are invalid
CometError – On internal error
- Available:
InDesign® comet_pdf® Illustrator®
- CScript:
- Examples:
Remove all HTML tags from an input text.
#!py #pragma plain import comet def main(): text = 'This is a <b>formatted</b> text. Colors are <span style="color: rgb(255, 0, 0);">applied</span> aswell.' rawText = comet.html.getRawText(text) comet.showMessage(rawText) #Shows 'This is a formatted text. Colors are applied aswell.' return 0
- comet.html.toTagged(input, options)¶
Converts an HTML input string to InDesign® TaggedText.
Input must be valid XHTML.
See here for a description of supported HTML attributes.
- Parameters:
input (str) – The input HTML text
options (dict) –
The conversion options.
Keys must be
str
.The following options are available:
kCSSUnescapeMode:
Value type:
int
Default: -1
How should the escaping of unsupported characters in style names from an export be reverted? * -1 = Anything * 0 = Hex Mode * 1 = Slash Mode
See here.
’kListAware’:
In 100 - ε (ε -> 0) cases, there is a ‘normal’ continuous text before a list (<ul>, <ol>). Lists of the generated TaggedText therefore always start automatically with a new paragraph.
This option controls the ε case in which:
the HTML text begins with a bulleted list (<ul>, <ol>)
the generated tagged text is to be inserted at the beginning of a text frame (and would then create an additional empty paragraph there).
False
: The ε case does not occur for meTrue
: If the HTML text begins with an <ul> or <ol> list (and only then), the pseudo tag <OptionalParaStyle:> is inserted instead of the <ParaStyle:> required for the paragraph separator.
The priint:comet functions for inserting and appending text recognize this tag and
remove it at text position 0
convert it to a normal <ParaStyle:> at all other text positions.
See here for more information.
- Returns:
The convertext text content
- Return type:
- Raises:
TypeError – When parameter types are invalid
When parameter input is empty
When parameter options contains invalid values
CometError – On internal error
- Available:
InDesign® comet_pdf®
- CScript:
- Examples:
Convert formatted HTML text to tagged text.
#!py #pragma plain import comet def main(): text = 'This is a <b>formatted</b> text. Colors are <span style="color: rgb(255, 0, 0);">applied</span> aswell.' tagged = comet.html.toTagged(text) comet.wlog(f'toTagged Result:\n{tagged}') #The result in the logfile: #%!TT<cTypeface:><cFont:>This is a <cTypeface:@Weight700NormalStretchUnknown>formatted<cTypeface:><cTypeface:><cFont:> text. Colors are <cColor:COLOR\:RGB\:Process\:1.00000000\,0.00000000\,0.00000000>applied<cColor:> aswell. return 0
- comet.html.exportText(input, options)¶
Export the content of a text frame as HTML.
Style information is put into a separate .css file into a subfolder in the target folder.
More information can be found here
- Parameters:
input –
The source text to export.
The parameter type can be:
CFrame
:A text frame. This will export the entire text inside the frame’s chain.
CTextModel
:A text model. This exports the entire text inside the model.
tuple
[CTextModel
,int
,int
]A text model with start and length. Length may be -1 = to end
options (dict[str, int | bool | str]) –
Options for the export.
Keys must be
str
.The following options are available:
- Returns:
When exporting to a string: the result HTML string.
When exporting to a file:
None
- Return type:
str | None
- Raises:
TypeError – When parameter types are invalid
When parameter options contains invalid values
CometError – On internal error
- Available:
InDesign® comet_pdf®
- CScript:
- Examples:
Export the script frame as an HTML file to a folder on the desktop.
#!py #pragma plain import comet import os def main(): if not comet.gFrame: #Nothing to export return 0 outFolder: str = comet.uncurtain('$DESKTOP/HTMLExport') outName: str = f'{comet.gDocument.getName()}_{comet.gFrame.getUID()}' comet.wlog(f'Exporting frame {comet.gFrame.getUID()} as HTML to {outFolder}{os.path.sep}{outName}.html') comet.html.exportText( comet.gFrame, options = { 'kOutputFolder' : outFolder, 'kOutputName' : outName, 'kDocTitle' : outName, 'kBodyOnly' : True } ) return 0