The last project I worked on at Freedom Scientific was a proof of concept to add support for external file formats. It began as writing up a design document; I looked into a library called Aspose.Words for .NET, and started playing around with it; it seemed to do exactly what we need, in that it provided open and save for: Microsoft OpenXML documents (.docx), EPUB, HTML, RTF, and more; it also let us dispense with another library (Inso) that only supported a few formats and was becoming a bit ossified.
The requirements were that the plug-in formats be completely removable and only add dependencies (e.g., to the aforementioned Aspose.Words) when they were actually being used. These would be used by OpenBook and WYNN, which were built around the same code base. Architecturally, it made sense to have plug-in DLLs with a well-known entry point (Rosetta_RegisterPlugins) which would be given a registration interface that it could call and pass in an interface (Rosetta::IFileFormat) for each file format supported. This interface supported a few methods (following my minimalist design):
- fetch the file format description and extension;
- whether open was supported, and open method;
- whether save was supported, and save method.
An existing "parser" interface was leveraged (use what you have), and became the base for Rosetta::IBuilder, adding on page navigation; pattern-wise, it was a builder anyway. This interface was passed to the open method and allowed for building a native document. For example, with Aspose.Words, their DocumentVisitor interface was implemented and became essentially a straight translation, although WYNN supports only a very small subset of the formatting of the various import formats (Aspose.Words seems to store documents internally much like Word documents, or at least preserves a Word-compatible interface).
For save, the same IBuilder interface was used, except this time, the plugin file format object implements it rather than uses it, and passes its implementation to a Rosetta::ITraverse interface passed to the save method. Internally, WYNN visits all pages and elements in each page and invokes the appropriate IBuilder methods, allowing the implementation to appropriately output the content in its own way. For Aspose.Words, its DocumentBuilder was used to build up a document.
Originally, the plan was to use COM interop to talk to Aspose.Words (a .NET library); but that was extremely difficult due to the need to pass in .NET streams and use constructors with no equivalent on the COM side. Eventually I tried out C++/CLI, and was pleasantly surprised that, even in ancient Visual Studio 2005, it "just worked" in ways that few technologies do.
At first I wrote a test plugin, supporting .test files that were just text files with a header and some formatting to allow for multiple pages. This allowed me to work through any possible issues and then move to the Aspose.Words plugin. As mentioned, I used their DocumentVisitor and DocumentBuilder, and a single implementation worked for all the formats needed, only changing the IFileFormat class to have appropriate extensions and descriptions and Aspose save/open enumeration values for the file type.
Only EPUB needed extra work: Aspose does not support opening EPUB files, just saving; so I wrote some glue code making use of the Xerces XML parser (and a wrapper I had already written for our Notecards file format, which could be used almost exactly as is since it was sufficiently general) to read the "spine" and then used Aspose to open the XHTML content files and build up the native document from them with IBuilder as usual.
Plugin discovery works simply: we check a Rosetta folder in the same directory as the main executable for *.dll, open matching files, and try to register them. Plugins are only unloaded when the application exits; we do not dynamically unload, although that could be an option for the future if it were necessary. Registered IFileFormat interfaces are queried for the extensions they support and a dynamic file type is assigned and they are added to a map for future reference by the functions that map extensions to file types. There is a function that creates IIO interfaces from an extension; an ExternIO class was created that provides the last piece in the puzzle to bridge from WYNN's file I/O to IFileFormat.