Section 3`Associated Files', carrying LATEX and MathML views of mathematical content.
There are several ways in which file attachments may be associated with specific portions of a PDF document, using the `Associated Files' technique [pdfA3, Annex E]. The file is embedded/attached and then associated, by a method, either to:
- the document as a whole [pdfA3, § E.3], [PDF20, §14.13.2] — e.g. the full LATEX source, or preamble file used when converting snippets of mathematical content into a MathML presentation of the same content;
- a specific page within the document [pdfA3, § E.4], [PDF20, §14.13.3] or to a (perhaps larger) logical document part using PDF 2.0 [PDF20, §14.13.7];
- graphic objects in a content stream [pdfA3, § E.5], [PDF20, §14.13.4] — when structure is available, this is not the preferred method‡‡;
- a structure node [pdfA3, § E.7], [PDF20, §14.13.5] such as /Figure, /Formula, /Div, etc.
- an /XObject [pdfA3, § E.6], [PDF20, §14.13.6] such as an included image of a formula or other mathematical/technical/diagrammatic content;
- an annotation [pdfA3, § E.8] — but this method can be problematic with regard to validation for PDF/A [pdfA3, §6.3], and PDF/UA [PDF-UA1, §7.18] standards††9.
Fig. \subfigref{2a} shows how attachments are presented within a separate panel of a browser window, using information from an array of filenames; see Fig. \subfigref{2b}. This is independent of the page being displayed, so the array must be referenced from the document level. This is seen in Fig. \subfigref{3c} using the /Names key of the /Catalog dictionary, which references object 2080, whose /EmbeddedFiles key then references the filenames array (object 1860 in Fig. \subfigref{2b}). One can also see in Fig. \subfigref{2b} how each filename precedes an indirect reference to the /Filespec dictionary [PDF20, §7.11.3, Tables 44 and 45] for the named file; see Fig. \subfigref{2c}. This dictionary contains a short description (/Desc) of the type of content as well as the filename to use on disk, and a link via the /EF key to the actual EmbeddedFile stream object.
That a file is `Associated' is indicated by the /AFRelationship key, whose value is a PDF name indicating how the file is related to visible content. Options here are /Source as used with the LATEX source coding, or /Supplement as used with the MathML description. Other possibilities are /Data (e.g., for tabular data) and /Alternative for other representations such as audio, a movie, projection slides or anything else that may provide an alternative representation of the same content. /Unspecified is also available as a non-specific catch-all.
Not all attachments need be `Associated' and conversely not all `Associated Files' need be displayed in the `Attachments' panel, so there is another array (object 1859) as shown in Fig. \subfigref{3c}, linked to the /MarkInfo sub-dictionary of the /Catalog dictionary. Files associated with the document as a whole, as in method (i) above, link via the /AF key in the /Catalog dictionary (see Fig. \subfigref{3c}).
For the LATEX source of a mathematical expression method (iv) is preferred, provided structure tagging is present within the PDF. This is discussed below in Sect. 3.1. Method (iii) also works, provided the expression is built from content confined to a single page. This is described in Sect. 3.2.
As `Associated Files' have only been part of published PDF/A standards [pdfA3] since late 2012, it may be some time before PDF readers provide a good interface for `Associated Files', beyond using the `\textsf{Attachments}' pane. This ought to include interfaces to view the contents of attached files, do searching within the files, and make the file's contents available to assistive technology. One possible way to display this association is apparent in earlier work [TUG2002], whereby a bounding rectangle appears as the mouse enters the appropriate region.
Subsection 3.1Embedded files associated with structure
With an understanding of how structure tagging works, as in Sect. 2.1, then associating files to structure is simply a matter of including an /AF key in the structure node's dictionary, as shown in Figures \subfigref{1a} and \subfigref{2b}. The value for this key is an array of indirect references to /Filespec objects for the relevant files.
There is nothing in the content stream in Fig. \subfigref{1b} to indicate that there is a file associated with this structure node. Rather the browser, knowing that `Associated' files are present, needs to have gone through some pre-processing to first locate the node (if any) to which it is associated, then trace down the structure tree to the deepest child nodes (objects 114, 116, 118). From their /K entries (viz., 9, 10, 11 resp.), the relevant marked content in the page's contents stream is located using these /MCID numbers.
Subsection 3.2Embedded files associated with content
With an understanding of how content tagging works, as in Sect. 2.1,
and the fact that marked content operators may be nested,
then associating files to content is also quite simple.
One simply uses an /AF tag within the page's content stream
with BDC
… EMC
surrounding the content to be marked,
as shown in Fig. \subfigref{3a}.
This employs the named resource variant (here /inline-1)
to indicate the array of `Associated' files.
Fig. \subfigref{3b}
shows how this name is used as a key (in dictionary object 20)
having as value an array of indirect references to /Filespec objects (27 and 29).
These resources can be specific to a particular page dictionary (object 5),
but in the example document\Exfootmark the named resources are actually made available to all pages,
since this accords with not including multiple copies of files when a mathematical expression is used repeatedly.
Finally Fig. \subfigref{3c} shows the coding required when embedded files, some of which may also be associated to content or structure, are present within a PDF document. One sees that the array (object 1859) of indirect object references in the lower part of Fig. \subfigref{3c} refer to the same /Filespec objects (27 and 29) as the named resources (object 20) in the upper part of Fig. \subfigref{3b}. These are the same references using /AF keys seen in Fig. \subfigref{1b} and Fig. \subfigref{2b} to the objects themselves in Fig. \subfigref{2c}.
This mechanism makes it easier for a PDF reader to determine that there are files associated to a particular piece of content, by simply encountering the /AF tag linked with a named resource. This should work perfectly well with a PDF file that is not fully tagged for structure. However, if the content is extended (e.g., crosses a page-boundary) then it may be harder for a PDF writer to construct the correct content stream, properly tagging two or more portions.