comet.strutils

The comet.strutils module provides comet specific string manipulation functionality.

For common string operations use the default Python str functionality.

Methods

comet.strutils.getNetweight(source, convertUniTags=False, replaceTypos=False)

Determine the net value of a string. The net values of the strings are calculated as follows:

  • Empty strings are replaced by an invisible space (Unicode 0x200B).

  • UTF-8 characters are uniformly translated into <0xXXXX> tags.

  • If the text begins with %!TT, the ParaStyle tag immediately following it (if present) is removed.

  • All other ParaStyles are replaced by paragraph separators.

  • All <nl:> are replaced by paragraph separators.

  • All other TaggedText tags are removed from the text.

  • All double quotation marks are replaced by “ and all single quotation marks by ‘.

  • All types of spaces (Unicode 0x2000 - 0x200F) are replaced by blanks.

  • All types of separators (Unicode 0x2010 - 0x2016) are each replaced by a minus sign.

Parameters:
  • sourceID (str) – The string to get the net value for.

  • convertUniTags (bool) – Should Unicode tags of the form <0x200B> be replaced automatically?

  • replaceTypos (bool) – Should quotation marks, spaces and separators be standardized?

Returns:

The netweight version of parameter source

Return type:

str

Available:

InDesign® comet_pdf® Illustrator®

CScript:
Examples:

Convert german Umlauts to uniform tags and show them in a dialog.

#!py
#pragma plain

import comet

def main():
    source: str = 'ÄÖÜ'
    netWeight: str = comet.strutils.getNetweight(source)

    comet.dialog.showMessage(netWeight) #Shows "<0x00C4><0x00D6><0x00DC>"

    return 0
comet.strutils.escapeTagged(source)

Replace all non-ASCII characters with TaggedText markers.

Returns:

The version of parameter source where non-ASCII characters have been replaced.

Return type:

str

Available:

InDesign® comet_pdf® Illustrator®

CScript:

string::escape_tagged

comet.strutils.unescapeTagged(source)

Replace TaggedText markers with the corresponding UTF8 characters.

Returns:

The version of parameter source where TaggedText markers have been replaced.

Return type:

str

Available:

InDesign® comet_pdf® Illustrator®

CScript:

string::unescape_tagged