\( \renewcommand{\ensuremath}{} \renewcommand{\mathnormal}{} \renewcommand{\qedhere}{} \def\sp{^} \def\sb{_} \def\vrule{|} \def\dag{\dagger} \def\llbracket{[\![} \def\rrbracket{]\!]} \def\sssize{\scriptsize} \def\addtodocassocfiles#1{ \xdef\docassociatedfiles{\docassociatedfiles\space#1 0 R}} \def\linkdocassocfiles{ \pdfcatalog{/AF[\docassociatedfiles]} } \def\docassociatedfiles{} \renewcommand{\AtEndAfterFileList}[1]{} \renewcommand{\acro}[1]{\textSMC{#1}\@} \renewcommand{\textSMC}[1]{{\SMC #1}} \def\texttub{\textsl } \renewcommand{\cs}[1]{\texttt{\string#1}} \renewcommand{\pdt}[1]{\textsf{#1}} \renewcommand{\uni}[1]{{\textsf{U+#1}}} \newcommand{\BNF}[2]{\langle\mathit{#2}#1\rangle} \newcommand{\BNF}[1]{\langle\mathit{#1}#1\rangle} \def\Exfootmark{\hyperlink{Hfootnote.1}{\footnotemark[1]}} \def\Octalfootmark{footnote \hyperlink{Hfootnote.11}{11}} \def\PDFA{\acro{PDF/A}} \def\PDFUA{\acro{PDF/UA}} \def\ISO{\acro{ISO}} \def\{\{ }\def\inst#1{}} \def\dblquote#1{"#1"} \def\PPDF{PDF} \def\TeX{TeX} \def\LaTeX{LaTeX} \def\MathML{MathML} \def\LNAIlink{http://www.springer.com/computer/theoretical+computer+science/book/978-3-319-08433-6?otherVersion=978-3-319-08433-6} \)

Section 3`Associated Files', carrying LATEX and MathML views of mathematical content.

There are several ways in which file attachments may be associated with specific portions of a PDF document, using the `Associated Files' technique [pdfA3, Annex E]. The file is embedded/attached and then associated, by a method, either to:

  1. the document as a whole [pdfA3, § E.3], [PDF20, §14.13.2]  —  e.g. the full LATEX source, or preamble file used when converting snippets of mathematical content into a MathML presentation of the same content;
  2. a specific page within the document [pdfA3, § E.4], [PDF20, §14.13.3] or to a (perhaps larger) logical document part using PDF 2.0 [PDF20, §14.13.7];
  3. graphic objects in a content stream [pdfA3, § E.5], [PDF20, §14.13.4]  —  when structure is available, this is not the preferred method‡‡;
  4. a structure node [pdfA3, § E.7], [PDF20, §14.13.5] such as /Figure, /Formula, /Div, etc.
  5. an /XObject [pdfA3, § E.6], [PDF20, §14.13.6] such as an included image of a formula or other mathematical/technical/diagrammatic content;
  6. an annotation [pdfA3, § E.8]  —  but this method can be problematic with regard to validation for PDF/A [pdfA3, §6.3], and PDF/UA [PDF-UA1, §7.18] standards††9.

Fig. \subfigref{2a} shows how attachments are presented within a separate panel of a browser window, using information from an array of filenames; see Fig. \subfigref{2b}. This is independent of the page being displayed, so the array must be referenced from the document level. This is seen in Fig. \subfigref{3c} using the /Names key of the /Catalog dictionary, which references object 2080, whose /EmbeddedFiles key then references the filenames array (object 1860 in Fig. \subfigref{2b}). One can also see in Fig. \subfigref{2b} how each filename precedes an indirect reference to the /Filespec dictionary [PDF20, §7.11.3, Tables 44 and 45] for the named file; see Fig. \subfigref{2c}. This dictionary contains a short description (/Desc) of the type of content as well as the filename to use on disk, and a link via the /EF key to the actual EmbeddedFile stream object.

That a file is `Associated' is indicated by the /AFRelationship key, whose value is a PDF name indicating how the file is related to visible content. Options here are /Source as used with the LATEX source coding, or /Supplement as used with the MathML description. Other possibilities are /Data (e.g., for tabular data) and /Alternative for other representations such as audio, a movie, projection slides or anything else that may provide an alternative representation of the same content. /Unspecified is also available as a non-specific catch-all.

Not all attachments need be `Associated' and conversely not all `Associated Files' need be displayed in the `Attachments' panel, so there is another array (object 1859) as shown in Fig. \subfigref{3c}, linked to the /MarkInfo sub-dictionary of the /Catalog dictionary. Files associated with the document as a whole, as in method (i) above, link via the /AF key in the /Catalog dictionary (see Fig. \subfigref{3c}).

For the LATEX source of a mathematical expression method (iv) is preferred, provided structure tagging is present within the PDF. This is discussed below in Sect. 3.1. Method (iii) also works, provided the expression is built from content confined to a single page. This is described in Sect. 3.2.

As `Associated Files' have only been part of published PDF/A standards [pdfA3] since late 2012, it may be some time before PDF readers provide a good interface for `Associated Files', beyond using the `\textsf{Attachments}' pane. This ought to include interfaces to view the contents of attached files, do searching within the files, and make the file's contents available to assistive technology. One possible way to display this association is apparent in earlier work [TUG2002], whereby a bounding rectangle appears as the mouse enters the appropriate region.

\subfiglabel{3a} \setbox0= \setbox0=\kern-10pt\lower\ht0\box0 \ht0=0pt \dp0=0pt \box0 \setbox0=\vbox{\hsize=\linewidth {\scriptsize \begin{multicols}2
/AF /inline-1 BDC
1 0 0 1 158.485 0 cm
 ...
/mi </Alt(  k  )
>>BDC
 ...
 ...
EMC
1 0 0 1 10.303 0 cm
/mi </Alt(  real numbers  )
>>BDC
BT
/F42 10.9091 Tf
 [(R)]TJ
ET
EMC
...
EMC
\end{multicols}} } \vrule height \ht0 width 0pt \removelastskip \leavevmode   (a) A portion of the PDF content stream associating content with embedded files for the mathematical expression indicated in Fig. \subfigref{2a}, using a marked content tag /AF to refer to a named resource /inline-1. All content down to the final EMC is associated. \subfiglabel{3b} \setbox0=\vbox{\hsize=\linewidth {\scriptsize
20 0 obj
<< /inline-1 [27 0 R 29 0 R] /inline-2 [31 0 R 33 0 R] /display-1 [35 0 R 37 0 R] 
  /inline-3 [39 0 R 41 0 R] /inline-4 [43 0 R 45 0 R] /inline-5 [47 0 R 49 0 R] 
 ... /inline-31 [1475 0 R 1477 0 R] >>
\begin{multicols}2
90 0 obj
<<
 /Properties 20 0 R
 /Font << /F75 97 0 R /F79 100 0 R
   /F77 102 0 R  /F45 105 0 R /F78 106 0 R 
  ... >>
/XObject << /Im1 25 0 R >>
/ProcSet [ /PDF /Text ]
>>
endobj
5 0 obj
<<
/Type /Page
/Contents 91 0 R
/Resources 90 0 R
/MediaBox [0 0 595.276 841.89]
/Tabs/S
/Parent 773 0 R
/StructParents 0
>>
endobj
\end{multicols}} } \vrule height \ht0 width 0pt \leavevmode   (b) A portion of the /Properties dictionary (upper, object 20) which is linked to a /Page object (lower right, object 5) via its /Resources key (see lower left, object 90). Thus a name (such as /inline-1) is associated with an array of /Filespec references (viz. [27 0 R 29 0 R]), which lead to the LATEX and MathML files seen in Fig. \subfigref{2c}. \subfiglabel{3c} \setbox0=\vbox{\hsize=\linewidth {\scriptsize \begin{multicols}2
2080 0 obj
<<
/Dests 2079 0 R
/EmbeddedFiles 1860 0 R
>>
endobj
2081 0 obj
<<
/Type /Catalog
/Pages 773 0 R
/Names 2080 0 R
/ViewerPreferences <> 
/OutputIntents [ << /Type /OutputIntent 
 /S/GTS_PDFA1 
 /DestOutputProfile 1 0 R /OutputConditionIdentifier 
 (sRGB_IEC61966-2-1_no_black_scaling)  /Info
 (sRGB IEC61966 v2.1 without black scaling) >> ]
/Metadata 2 0 R/Lang (en-US)
/PageMode/UseOutlines
/MarkInfo <>
/AF [ 22 0 R 24 0 R]
/PageLabels<> ... ]>>
/OpenAction 4 0 R
/StructTreeRoot 95 0 R
>>
endobj
\end{multicols}
1859 0 obj
[ 27 0 R 29 0 R 31 0 R 33 0 R 35 0 R 37 0 R 39 0 R 41 0 R 43 0 R 45 0 R 47 0 R 49 0 R 51 0 R 
 53 0 R 55 0 R 57 0 R 59 0 R 61 0 R 63 0 R 65 0 R 67 0 R 69 0 R 71 0 R 73 0 R 75 0 R 77 0 R 
... 1850 0 R]
} \removelastskip } \vrule height \ht0 width 0pt \leavevmode   (c) The document's /Catalog (object 2081) indicates presence of embedded files via the /Names key (object 2080). This references the array (object 1860 in Fig. \subfigref{2b}), to establish the correspondence between filenames and /Filespec dictionaries. Embedded files which are `Associated' to content portions are also listed in an array (object 1859) referenced from the /AF key in the /MarkInfo dictionary.
Figure3.1Embedded files associated with specific content.

Subsection 3.1Embedded files associated with structure

With an understanding of how structure tagging works, as in Sect. 2.1, then associating files to structure is simply a matter of including an /AF key in the structure node's dictionary, as shown in Figures \subfigref{1a} and \subfigref{2b}. The value for this key is an array of indirect references to /Filespec objects for the relevant files.

There is nothing in the content stream in Fig. \subfigref{1b} to indicate that there is a file associated with this structure node. Rather the browser, knowing that `Associated' files are present, needs to have gone through some pre-processing to first locate the node (if any) to which it is associated, then trace down the structure tree to the deepest child nodes (objects 114, 116, 118). From their /K entries (viz., 9, 10, 11 resp.), the relevant marked content in the page's contents stream is located using these /MCID numbers.

Subsection 3.2Embedded files associated with content

With an understanding of how content tagging works, as in Sect. 2.1, and the fact that marked content operators may be nested, then associating files to content is also quite simple. One simply uses an /AF tag within the page's content stream with BDCEMC surrounding the content to be marked, as shown in Fig. \subfigref{3a}. This employs the named resource variant (here /inline-1) to indicate the array of `Associated' files. Fig. \subfigref{3b} shows how this name is used as a key (in dictionary object 20) having as value an array of indirect references to /Filespec objects (27 and 29). These resources can be specific to a particular page dictionary (object 5), but in the example document\Exfootmark the named resources are actually made available to all pages, since this accords with not including multiple copies of files when a mathematical expression is used repeatedly.

Finally Fig. \subfigref{3c} shows the coding required when embedded files, some of which may also be associated to content or structure, are present within a PDF document. One sees that the array (object 1859) of indirect object references in the lower part of Fig. \subfigref{3c} refer to the same /Filespec objects (27 and 29) as the named resources (object 20) in the upper part of Fig. \subfigref{3b}. These are the same references using /AF keys seen in Fig. \subfigref{1b} and Fig. \subfigref{2b} to the objects themselves in Fig. \subfigref{2c}.

This mechanism makes it easier for a PDF reader to determine that there are files associated to a particular piece of content, by simply encountering the /AF tag linked with a named resource. This should work perfectly well with a PDF file that is not fully tagged for structure. However, if the content is extended (e.g., crosses a page-boundary) then it may be harder for a PDF writer to construct the correct content stream, properly tagging two or more portions.

\subfiglabel{4a} \subfigure[Selection across mathematical content]{ } \subfiglabel{4b} \subfigure[Pasted text from the selection]{ } \subfiglabel{4c} \subfigure[Access-tags selected in the `Tags' tree, to show `fake spaces'.]{ }
Figure3.2.1This shows how the selection in \subfigref[(a)]{4a}, when copied and pasted into a text file, recovers the LATEX source \subfigref[(b)]{4b} that was used to specify the visual appearance of the mathematical content. In \subfigref[(c)]{4c} we see structure within a /Formula node, (see also Fig. \subfigref{5b}) with leaf-nodes of /accesstag structure nodes being marked content of type /AccessTag. This consists of a single space character carrying an /ActualText attribute which holds the replacement text; as seen explicitly in the coding shown in Fig. \subfigref{5a}. The `fake spaces' are very narrow; when selected they can be seen very faintly in \subfigref[(c)]{4c} within the ovals indicated, at the outer edge of the the bounding rectangles of the outermost math symbols.
\subfiglabel{5a} \setbox0= \setbox0=\kern-10pt\lower\ht0\box0 \ht0=0pt \dp0=0pt \box0 \setbox0=\vbox{\hsize=\linewidth {\scriptsize \begin{multicols}2
/AF /inline-1 BDC
1 0 0 1 51.508 0 cm
/AccessTag <\015k \134in \134RR
  \015
  \015\015)
>>BDC
BT
/F79 1 Tf
 [( )]TJ
ET
EMC
/mi <
 /Alt(  k  )
>>BDC
BT
/F30 10.9091 Tf
 [(k)]TJ
ET
EMC
1 0 0 1 6.023 0 cm
/mo <
 /Alt(  as element of  )
>>BDC
1 0 0 1 3.03 0 cm
BT
/F33 10.9091 Tf
 [(2)]TJ
ET
EMC
1 0 0 1 10.303 0 cm
/mi <
 /Alt(  real numbers  )
>>BDC
BT
/F42 10.9091 Tf
 [(R)]TJ
ET
EMC
1 0 0 1 7.879 0 cm
/AccessTag <\015)
>>BDC
BT
/F79 1 Tf
 [( )]TJ
ET
EMC
EMC
\end{multicols}} } \vrule height \ht0 width 0pt \removelastskip \leavevmode   (a) Complete portion of the content stream corresponding to the mathematics shown as selected in Fig. \subfigref{4a}. This is the same content as in Figures \subfigref{1b} and \subfigref{3a} but with the ` ...' parts there now showing the /AccessTag coding of a `fake space' with /ActualText attribute. The /AF ... BDC ... EMC wrapping of Fig. \subfigref{3a} is also shown. Being part of the document's content, these space characters are also assigned /MCID numbers to be linked to structure nodes, as in \subfigref[(b)]{5b} below. \subfiglabel{5b} \setbox0=\vbox{\hsize=\linewidth {\scriptsize \begin{multicols}2
122 0 obj
<<
/K [ 12 ]
/Pg 5 0 R
/P 112 0 R
/Type/StructElem/S/accesstag
>>
endobj
  ...
  ...
120 0 obj
<<
/K [ 121 0 R ]
/P 112 0 R
/Type/StructElem/S/math 
/A<>
>>
endobj
   ...
   ...
113 0 obj
<<
/K [ 8 ]
/Pg 5 0 R
/P 112 0 R
/Type/StructElem/S/accesstag
>>
endobj
112 0 obj
<<
/K [
113 0 R
120 0 R
122 0 R
]
/P 109 0 R
/Type/StructElem/S/Formula 
/ID(Math0.1)/T(InlineMath 0.1)
/AF [27 0 R 29 0 R] /A <>
>>
endobj
\end{multicols}} \removelastskip } \vrule height \ht0 width 0pt \leavevmode   (b) Portion of the structure tree as in Fig. \subfigref{1a}, but now showing how the `fake spaces' can be linked to structure nodes, here /accesstag. The missing portions of Fig. \subfigref{1a}, indicated there by ` ...' are now filled-in, but leaving out other parts whose purpose has already been explained. Fig. \subfigref{4c}, shows the tagging opened out within the `\textsf{Tags}' navigation panel, with the /accesstag structure nodes selected.
Figure3.2.2File content included as /ActualText for a `fake space', which itself can be tagged as marked content linked to an /accesstag structure node.
Feedback on the LaTeX to HTML conversion.