Comet 4 offers from version 4.0.5 the possibility to export text frames as HTML documents . Style information is output as CSS file. The goal is an möglist exact representation of the InDesign® contents in the HTML document.

Since Comet 4.1 R20505 there is the possibility to import HTML-formatted text into InDesign® documents

The Comet Plugins export the formatted text of a frame. Style information of the document is collected and stored in a separate CSS file in the "resources" subfolder of the target folder. The CSS style information results from paragraph formats, character formats, table formats and cell formats. Basically all supported styles of the document are exported.

Please note: The HTML export does not generate HTML pages of the InDesign® document pages.

Since HTML has a text structure similar to that of InDesign®, the text structure is translated as follows:

Local style changes are written directly to the style attribute of the respective element.

Attribut CSS Info
Font family font-family
Font face font-weight

In HTML/CSS it is hardly possible to directly specify a font style - the font style is set using strength, style and stretching. The values for these attributes are calculated using fontDB.

Fonts not defined in fontDB

If the font is not described in the fontDB, the attributes are determined from the font name.

Font names are not subject to any rules. Determining the font attributes from the font name is therefore only a very imprecise tool. In any case, you should make sure that all fonts used are also described in fontDB.

The following parts of the font name are supported:

font-weight font-style font-stretch
Extra Bold 900
Bold, Heavy 700
Demibold, Demibold 600
Light 100
ExtraLight 100
Italic italic
Condensed condensed
Semicondensed semicondensed
font-style
font-stretch
Line spacing line-height
Letter type text-transform Only 'capital' - text-transform:uppercase
Position vertical-align
InDesign® CSS
Normal
Superscript super
Subscript sub
Underline text-decoration text-decoration: underline
Line-through text-decoration text-decoration: line-through
Vertical align vertical-align Nicht kompatibel mit "Position"
Alignment text-align
InDesign® CSS
Left left
Center center
Right right
Justify justify
Justify, last left aligned
Justify, last right aligned
Justify, last centered
List type: Bullet Only normal CSS/HTML bullet characters for <ul> are possible.
Listentyp: Numbered list-style-type
InDesign® CSS
1, 2, 3, 4,... decimal
01, 02, 03, 04,.. decimal-leading-zero
001, 002, 003, 004,...
0001, 0002, 0003, 0004,..
I, II, III, IV,... upper-roman
i, ii, iii, iv,... lower-roman
A, B, C, D,.. upper-alpha
a, b, c, d,.. lower-alpha
Listentyp: Numbered, mode start "Continue numbering" and "Begin with" are both supported
Character color color: rgb(r, g, b)
color: #FF0000
Color names (Swatches) are not supported

Character styles are export as <span>-elements in HTML. The supported parameters are the same as with paragraph styles.

Tables are exported as <table>-elements in HTML. They support header and footer rows, merged cells and the following table format attributes:

Attribute CSS Info
Cell format All settings are supported
Table contour, strength border-left-width
border-right-width
border-top-width
border-bottom-width
Table contour, Color border-left-color
border-right-color
border-top-color
border-bottom-color
Table contour, Style border-left-style, border-right-style, border-top-style, border-bottom-style
InDesign® CSS
Solid solid
Thick-Thick double
Thick-Thin
Thick-Thin-Thick
Thin-Thin
Thin-Thick-Thin
Dashed(3 and 2) dashed
Dashed(4 and 4)
Dotted dotted
Japanese dots
Fill, alternating pattern

Table.Tablename tr oder td:nth-child(Blocksize + Skip first row/column):nth-last-child(n + Skip last row/column), fill color

)

To translate this attribute to a CSS style, two style definitions have to be made to define both fill colors. Unfortunately defining a block size is impossible, so each index of the block needs a custom selector.

e.g. First two rows cyan colored, next three rows magenta colored (Block size = 5), Skip first three rows, skip last four rows:

table.priint_KTabelle tr:nth-child(5n + 4):nth-last-child(n + 5), tr:nth-child(5n + 5):nth-last-child(n + 5) { background-color: rgba(0, 158, 227, 1.00); } Tabelle.priint_KTabelle tr:nth-child(5n + 6):nth-last-child(n + 5), tr:nth-child(5n + 7):nth-last-child(n + 5), tr:nth-child(5n + 8):nth-last-child(n + 5) { background-color: rgba(229, 0, 125, 1.00); }

html_01.gif

Table cells are exported as <td>-elements. The following format options are supported:

Attribute CSS Info
Paragraph style Set the paragraph style of the content
Text rotation transform: rotate(%ddeg) Applies to the <p> element inside the cell, otherwise the cell will rotate.
Cell contour, thickness border-left-width
border-right-width
border-top-width
border-bottom-width
Cell contour, type border-left-style, border-right-style, border-top-style, border-bottom-style
InDesign® CSS
Solid solid
Thick-Thick double
Thick-Thin
Thick-Thin-Thick
Thin-Thin
Thin-Thick-Thin
Dashed(3 and 2) dashed
Dashed(4 and 4)
Dotted dotted
Japanese dots
Cell contour, Color border-left-color
border-right-color
border-top-color
border-bottom-color
Cell surface, Color background-color: rgb(r, g, b)
background-color: #00FF00
Cell offset padding-left
padding-top
padding-right
padding-bottom

Any InDesign® control characters are either not available in HTML or have a different meaning. These characters will be exported as follows:

<?ACE HEXCODE ?>, e.g. <?ACE 8 ?> for the right-aligned TAB.

The following characters are treated like this:

Hexcode Normal meaning InDesign® Meaning
0003 End of text Exit nested format here
0004 End of transmisson Footnote
0007 Bell feed to here
0008 Backspace Tabulator for right orientation
0016 Synchronous idle Table anchor
0017 End of transmission block Table continuation
0018 Cancel Page number
0019 End of medium Paragraph name
001A Substitutes "non roman special glyph"

When Import these characters are of course converted back.

To define TaggedText directly in HTML, use the pseudo tag

<?IDTT ?>

The content of these tags is directly included in the result without further conversions. Whitspaces between the text and the ? will be ignored.

Inline frames are exported as separate HTML documents and linked via the iframe element. The filename is the UID of the frame + ".html". Inline text frames and picture frames are supported. In the case of an image frame, there is an option to link the image only or copy it to the "resources" subfolder of the destination folder. If a linked image no longer exists, a PNG of the image will be exported and also placed in the "resources" subfolder. The image name corresponds to the original name of the image, with the extension ".png" if the original image was not a PNG.

InDesign® distinguishes between two types of style hierarchies: The first type determines on which style a style is based, the second under which style folder it is subordinate. Both types are taken into account in HTML export.

html_workflow1

To maintain the style folder hierarchy, styles are noted in the CSS as follows:

p.formatGroup1.paragraph format2 { }

application in the document takes place as follows:

<p class="FormatGruppe1 Absatzformat2"></p>

This way you can track which style was in which style folder.

It is unfortunately not directly possible in CSS to create styles based on other styles. The Comet solves this problem by noting selectors for sub styles in the CSS styles. The styles have as properties the difference to their parent style. Then only the lowest style of the hierarchy chain is used.

In the following example, "paragraph format1" changes the typeface, "paragraph format2" is based on "paragraph format1" and changes the font size.

html_workflow2

The CSS notation is as follows:

p.paragraphFormat1, p.paragraphFormat2 { font-family: 'Calibri'; }
p.paragraphFormat2 { font-size: 14pt; }

application in the document takes place as follows:

<p class="paragraphFormat2"></p>

In InDesign®, it is possible to assign names to styles without any major restrictions. In HTML and CSS you are unfortunately somewhat restricted. In CSS the following characters have a special meaning

! " # $ % & ' ( ) * + , - . / : ; < = > ? @ [ \ ] ^ ` { | } ~

In InDesign® inheritance is already built into the definition of styles, and spaces are (normal) part of the style name. If you want to use styles with spaces in the HTML code, the spaces must be encoded accordingly, e.g. with %20. More info about the encodings can be found below.

Furthermore CSS names must not start with a number or with a hyphen followed by a number. For this reason, each style name gets the prefix "priint_".

After Comet 4.1 R21800 there are two different ways to export style names. For this the parameter "kCSSEscapeMode" of the cScript function html::export_ is used.

The following modes are available:

Hex mode:
In this mode, each of the characters listed above is replaced by a hex escape sequence: 0x + hex code of the character. For example 0x002B for the equal sign. You should take care to avoid such sequences in your original style names, as a possible translation is no longer unambiguous. The advantage of this mode is that style names in CSS and HTML are exactly the same.

Examples:

Style name CSS & HTML
Hello World priint_Hallo0x0020World
x< and %20 priint_x0x003E0x0020and0x00200x002520
AAA% %BBB priint_AAA0x00250x00200x0025BBB

Slash mode:
This mode places a backslash in front of each unsupported character in the CSS definition, but not in the HTML text application. This makes the style names easier to read, but CSS definition and HTML application are different, which makes editing by e.g. text substitution more difficult. In addition, some characters are handled separately:

Character CSS HTML
Space \%20 %20
% \%25 %25
< \< &lt;
> \> &gt;
" \" &quot;
& \& &amp;

Examples:

Style Name CSS HTML
Hello World priint_Hello\%20World priint_Hello%20World
x< and %20 priint_x\>\%20and\%20\%2520 priint_x&gt;%20and%20%2520
AAA% %BBB priint_AAA\%25\%20\%25BBB priint_AAA%25%20%25BBB

The HTML import is currently under development and is not part of the support of Werk!

Please note that only a limited set of HTML and CSS features can be supported.

Following tags are supported:

Tag Meaning Info
<p> Paragraph
<span> Character style or local change of supported text attributes
<table> Table Either: Only <tr> subnodes. Or: <thead>, <tbody>, and <tfoot> with <tr> subnodes.
<thead> Table header Direct under <table>, in conjunction with <tbody> and (optional) <tfoot>
<tbody> Table body Direct under <table>, in conjunction with (optional) <thead> and (optional) <tfoot>
<tfoot> Table foot Direct under <table>, in conjunction with <tbody> and (optional) <thead>
<tr> Table row
<td> Table cell supporting node for tables - is added underneath <tr>.
<ul> List with bulletpoints
<ol> Numbered list
<li> List element Supporting node for lists - is added to <ol> or <ul>.
<?ACE ...?> InDesign® Control character Tag for InDesign® control character inserts. See here.
<i>
<em>
Italic

The fontDB is used to calculate the required font style of the current font family fontDB. If no font family is specified in the HTML text, the font family of the insert is used in the InDesign® text.

<b>
<strong>
Bold
</ i>
<del>
<strike>
Line through Overridden by CSS attribute text-decoration: line-through. See here.
<u> Underline Overridden by CSS attribute text-decoration: underline. See here.
<br> Soft return \n
<sup> Super script
<sub> Sub script
<image> Image

Other tags like <div> or comments (<-- ... -->) are ignored

Currently the CSS style definitions written by the export not imported!

According to the HTML export, formats on the HTML node are determined by the "class" attribute. The "class" attribute of the following nodes corresponds to the following formats:

<p>, <li> Paragrah style
<span> Character style
<table> Table style
<td> Cell style

"priint_"-Prefixes in style names are removed. Characters reserved in HTML that have been replaced by an escape sequence are unescaped.

The import of CSS attributes is currently only supported as a style attribute of an HTML node (e.g. <span style="font-size:20pt">).

The following applies: The "lowest" attribute always has priority, i.e. the attribute that is closest to the content in the hierarchy. For example, the text attribute "font-size" will be overwritten on a <p> node if a <span> node sets the same attribute below it.

The following attributes are supported:

CSS Attribute Supported values / units Info
font-family Only family-name values, not generic-family. See here All values must be enclosed by a leading sign! e.g. style="font-family: 'Minion Pro' "

More info here.

font-size int or float,
e.g. font-size: 24pt;

Font size in points

[since v4.1 R23700] The following units are allowed:

pt
px
in
mm
cm
pc

Relative sizes are not supported!

font-weight 100, 200, ..., 900
400 corresponds to normal
To calculate the required font style of the current font family the fontDB is used. If no font family is specified in the HTML text, the font family of the insert position in InDesign® is used.
font-style italic
oblique
font-stretch ultra-condensed
extra-condensed
condensed
semi-condensed
normal
semi-expanded
expanded
extra-expanded
ultra-expanded
color rgb(int, int, int)
#0000FF
MyColor
'Meine Farbe'
Font color

Support of named document swatches since v4.3 R34050
text-decoration underline
line-through
  • underline overwrites the <u> node.
  • line through overwrites the <s>, <del> and <strike> nodes.
CSS Attribute Supported values / units Info
text-align left, right, center, justify

Text alignment. For the value justify the tag <pTextAlignment:JustifyFull> is used. Other blockset formats are not supported.

CSS Attribute Supported values / units Info
width,
height
int or float

column width in points. In case of contradictory column widths in the individual cells of a column, the largest is used.

[since v4.1 R23700] The following units are allowed:

pt
px
in
mm
cm
pc

Relative sizes and direct attributes like <td width="60pt">Title</td> are not supported!

Here is an example:

<td style="width:60pt">Title</td>

HTML entities in decimal (e.g. &#160) and hexadecimal notation (e.g. &#xA0) are supported and replaced by the corresponding characters (e.g. for html::to_tagged or direct import into Illustrator). Additionally, keywords for HTML entities (e.g. &nbsp;) are supported. You can find the complete list at https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references

There are cScript functions available for importing and exporting HTML:

Export:

Import:

Misc.:

Simple export of a text frame:

int main()
{
	String 		folder 		= string::alloc();
	String 		docName 	= string::alloc("HTML Export");
	int 		err 		= 0;

	err = file::select_folder(folder);
	if (err) 
	{
	 return 0;
	}

	document::name(docName);

	html::export_frame( gFrame,
	 			"kOutputFolder", folder,
	 			"kOutputName", docName,
	 			"kDocTitle", "Hello World",
	 			"kCSSEscapeMode", 0);

	return 0;
}