comet.html

The html module provides utility functions for working with input and output HTML text.

Methods

comet.html.getRawText(input: str) str

Converts an HTML input string to raw text that no longer contains formatting tags.

All HTML entities and tags are removed. The input must be valid HTML.

Parameters:

input (str) – The input HTML text

Returns:

The raw text content

Return type:

str

Raises:
Available:

InDesign® comet_pdf® Illustrator®

CScript:

html::raw_text

Examples:

Remove all HTML tags from an input text.

#!py
#pragma plain

import comet

def main():
    text = 'This is a <b>formatted</b> text. Colors are <span style="color: rgb(255, 0, 0);">applied</span> aswell.'

    rawText = comet.html.getRawText(text)

    comet.showMessage(rawText)      #Shows 'This is a formatted text. Colors are applied aswell.'

    return 0
comet.html.toTagged(input: str, options: dict = {}) str

Converts an HTML input string to InDesign® TaggedText.

Input must be valid XHTML.

See here for a description of supported HTML attributes.

Parameters:
  • input (str) – The input HTML text

  • options (dict) –

    The conversion options.

    Keys must be str.

    The following options are available:

    kCSSUnescapeMode:

    • Value type: int

    • Default: -1

    How should the escaping of unsupported characters in style names from an export be reverted? * -1 = Anything * 0 = Hex Mode * 1 = Slash Mode

    See here.

    ’kPrefix’:

    • Value type: str

    • Default: ‘%!TT’

    Prefix to prepend to the result.

    See here.

    ’kCharStyleAware’:

    Whether to insert TT Jokers into the text

    See here.

    ’kListAware’:

    In 100 - ε (ε -> 0) cases, there is a ‘normal’ continuous text before a list (<ul>, <ol>). Lists of the generated TaggedText therefore always start automatically with a new paragraph.

    This option controls the ε case in which:

    • the HTML text begins with a bulleted list (<ul>, <ol>)

    • the generated tagged text is to be inserted at the beginning of a text frame (and would then create an additional empty paragraph there).

    • False: The ε case does not occur for me

    • True: If the HTML text begins with an <ul> or <ol> list (and only then), the pseudo tag <OptionalParaStyle:> is inserted instead of the <ParaStyle:> required for the paragraph separator.

    The priint:comet functions for inserting and appending text recognize this tag and

    • remove it at text position 0

    • convert it to a normal <ParaStyle:> at all other text positions.

    See here for more information.

Returns:

The convertext text content

Return type:

str

Raises:
  • TypeError – When parameter types are invalid

  • ValueError

    • When parameter input is empty

    • When parameter options contains invalid values

  • CometError – On internal error

Available:

InDesign® comet_pdf®

CScript:

html::to_tagged

Examples:

Convert formatted HTML text to tagged text.

#!py
#pragma plain

import comet

def main():
    text = 'This is a <b>formatted</b> text. Colors are <span style="color: rgb(255, 0, 0);">applied</span> aswell.'

    tagged = comet.html.toTagged(text)

    comet.wlog(f'toTagged Result:\n{tagged}')

    #The result in the logfile:
    #%!TT<cTypeface:><cFont:>This is a <cTypeface:@Weight700NormalStretchUnknown>formatted<cTypeface:><cTypeface:><cFont:> text. Colors are <cColor:COLOR\:RGB\:Process\:1.00000000\,0.00000000\,0.00000000>applied<cColor:> aswell.

    return 0
comet.html.exportText(input: CFrame | CTextModel | tuple[CTextModel, int, int], options: dict[str, int | bool | str]) str | None

Export the content of a text frame as HTML.

Style information is put into a separate .css file into a subfolder in the target folder.

More information can be found here

Parameters:
  • input

    The source text to export.

    The parameter type can be:

    • CFrame:

      A text frame. This will export the entire text inside the frame’s chain.

    • CTextModel:

      A text model. This exports the entire text inside the model.

    • tuple [CTextModel, int, int]

      A text model with start and length. Length may be -1 = to end

  • options (dict[str, int | bool | str]) –

    Options for the export.

    Keys must be str.

    The following options are available:

    Export options.

    ’kOutputFolder’

    • Value type: str

    Target folder.

    Required when exporting to a file

    When this option is provided, the function returns None.

    Also requires the additional option ‘kOutputName’.

    ’kOutputName’

    • Value type: str

    Name of the output file (without extension).

    Required when exporting to a file

    When this option is provided, the function returns None.

    Also requires the additional option ‘kOutputFolder’.

    ’kStartPosition’

    • Value type: int

    • Default value: 0

    Start index in the text model

    When parameter input is CTextModel, this is relative to the text model!

    ’kLength’

    • Value type: int

    • Default value: -1

    Length in the text model (-1 = until the end).

    When parameter input is CTextModel, this is relative to the text model!

    ’kDocTitle’

    • Value type: str

    • Default value: [Filename]

    Title of the HTML document

    ’kCopyImages’

    Link or copy images? (False = link, True = copy)

    ’kExportUnsupported’

    Export in HTML unsupported images as .png?

    ’kExportMissing’

    Export missing images from previews as .png?

    ’kWriteCSS’

    Write CSS?

    ’kInputCSS’

    • Value type: str

    • Default value: ‘’

    Alternative CSS.

    When this parameter is set, the input is used as CSS instead of the generated one.

    The input can be a path to a CSS file or a CSS definition.

    ’kCSSEscapeMode’

    • Value type: int

    • Default value: 0

    Which escape style should be used for unsupported characters in style names?

    • 0: Hex Mode

    • 1: Slash Mode

    See here.

    ’kBodyOnly’

    Export complete HMLT incl. <html><body> tags or only the <body> contents?

    • False: Complete HTML

    • True: Only contents of <body>

    ’kEscapeBrackets’

    Create XML conform output?

    • False: No

    • True: Yes, the following replacements are done:

      • < to &lt;

      • > to &gt;

      • & to &amp;

    ’kHexColors’

    Export CSS color values in hexadecimal format?

Returns:

  • When exporting to a string: the result HTML string.

  • When exporting to a file: None

Return type:

str | None

Raises:
Available:

InDesign® comet_pdf®

CScript:

html::export_frame

Examples:

Export the script frame as an HTML file to a folder on the desktop.

#!py
#pragma plain

import comet
import os

def main():
    if not comet.gFrame:
        #Nothing to export
        return 0

    outFolder: str = comet.uncurtain('$DESKTOP/HTMLExport')
    outName: str = f'{comet.gDocument.getName()}_{comet.gFrame.getUID()}'

    comet.wlog(f'Exporting frame {comet.gFrame.getUID()} as HTML to {outFolder}{os.path.sep}{outName}.html')

    comet.html.exportText(
        comet.gFrame,
        options = {
            'kOutputFolder' : outFolder,
            'kOutputName' : outName,
            'kDocTitle' : outName,
            'kBodyOnly' : True
        }
    )

    return 0