Barnsley fern fractal

Thoughts on software architecture and development, and methods and techniques for improving the quality thereof.

David B. Robins (home)


Like Code Visions on Facebook.

Code Visions: Improving software quality
Rosetta: plug-in file format architecture

By David B. Robins tags: C++, Development, Architecture Friday, April 12, 2013 10:22 EST (link)

The last project I worked on at Freedom Scientific was a proof of concept to add support for external file formats. It began as writing up a design document; I looked into a library called Aspose.Words for .NET, and started playing around with it; it seemed to do exactly what we need, in that it provided open and save for: Microsoft OpenXML documents (.docx), EPUB, HTML, RTF, and more; it also let us dispense with another library (Inso) that only supported a few formats and was becoming a bit ossified.

The requirements were that the plug-in formats be completely removable and only add dependencies (e.g., to the aforementioned Aspose.Words) when they were actually being used. These would be used by OpenBook and WYNN, which were built around the same code base. Architecturally, it made sense to have plug-in DLLs with a well-known entry point (Rosetta_RegisterPlugins) which would be given a registration interface that it could call and pass in an interface (Rosetta::IFileFormat) for each file format supported. This interface supported a few methods (following my minimalist design):

  • fetch the file format description and extension;
  • whether open was supported, and open method;
  • whether save was supported, and save method.

An existing "parser" interface was leveraged (use what you have), and became the base for Rosetta::IBuilder, adding on page navigation; pattern-wise, it was a builder anyway. This interface was passed to the open method and allowed for building a native document. For example, with Aspose.Words, their DocumentVisitor interface was implemented and became essentially a straight translation, although WYNN supports only a very small subset of the formatting of the various import formats (Aspose.Words seems to store documents internally much like Word documents, or at least preserves a Word-compatible interface).

For save, the same IBuilder interface was used, except this time, the plugin file format object implements it rather than uses it, and passes its implementation to a Rosetta::ITraverse interface passed to the save method. Internally, WYNN visits all pages and elements in each page and invokes the appropriate IBuilder methods, allowing the implementation to appropriately output the content in its own way. For Aspose.Words, its DocumentBuilder was used to build up a document.

Originally, the plan was to use COM interop to talk to Aspose.Words (a .NET library); but that was extremely difficult due to the need to pass in .NET streams and use constructors with no equivalent on the COM side. Eventually I tried out C++/CLI, and was pleasantly surprised that, even in ancient Visual Studio 2005, it "just worked" in ways that few technologies do.

At first I wrote a test plugin, supporting .test files that were just text files with a header and some formatting to allow for multiple pages. This allowed me to work through any possible issues and then move to the Aspose.Words plugin. As mentioned, I used their DocumentVisitor and DocumentBuilder, and a single implementation worked for all the formats needed, only changing the IFileFormat class to have appropriate extensions and descriptions and Aspose save/open enumeration values for the file type.

Only EPUB needed extra work: Aspose does not support opening EPUB files, just saving; so I wrote some glue code making use of the Xerces XML parser (and a wrapper I had already written for our Notecards file format, which could be used almost exactly as is since it was sufficiently general) to read the "spine" and then used Aspose to open the XHTML content files and build up the native document from them with IBuilder as usual.

Plugin discovery works simply: we check a Rosetta folder in the same directory as the main executable for *.dll, open matching files, and try to register them. Plugins are only unloaded when the application exits; we do not dynamically unload, although that could be an option for the future if it were necessary. Registered IFileFormat interfaces are queried for the extensions they support and a dynamic file type is assigned and they are added to a map for future reference by the functions that map extensions to file types. There is a function that creates IIO interfaces from an extension; an ExternIO class was created that provides the last piece in the puzzle to bridge from WYNN's file I/O to IFileFormat.

The Tombstone Installer

By David B. Robins tags: Windows, Architecture Friday, March 8, 2013 21:22 EST (link)

We have two Windows installers (WiX projects), which are components of two released products. (What they install isn't important.) Unfortunately (the first in a train), each installs the component as a different (major) version (it stamps the product's own version); let's say 9.x and 11.x. Fortunately, the component was subsequently broken out into its own MSI, using version 12.x (because we needed a new major version greater than the highest in use, 11.x).

An MSI file is composed internally of a series of tables (like a database); the WiX toolset provides a declarative XML syntax for populating those tables. So, in the case of upgrades, there is the MSI table, and WiX elements such as UpgradeVersion. Usually one can get by with reading the WiX documentation; sometimes one has to go to MSDN. The Orca editor is very handy for viewing these tables in a built MSI file.

Speaking of upgrades, the idea is that if an MSI installer detects a newer version, it is supposed to abort (it shouldn't, all else being equal, install over an installed newer version), and it should uninstall an older version then install itself in place of it (with the caveat that if the Product Codes match, it will crash and burn horribly, although I think one is supposed to use patch functionality here to make a "minor upgrade"…).

Unfortunately (ah, here's the rub), the 9.x older version was set up to only consider versions greater than it (e.g., 9.0.1234) and lower than 9.1 as "newer" (that's right: it will cheerfully install right over 12.x). The 11.x version is even worse: it only considers version between it (e.g., 11.0.1234) and 9.1 as "newer" (i.e., empty set—it'll install itself over anything). Certain combinations of installing newer then older whole products containing different component installers could, one can see, end up with an old version of the component installed (or both), a bad situation.

Fortunately, there is a way to ensure an old 9.x component won't install itself over a newer 12.x component, and that is by providing what I call a "tombstone" component—another MSI—which advertises itself with version 9.0.9999, which is within the valid "newer" version range of the old component and thus will lead to it refusing to install, which is what we want; but the main component is still version 12.x. The tombstone MSI acts as a marker that this version series is dead: hence the term. The tombstone MSI gets the old UpgradeCode; the new component gets a new one. Unfortunately (!) this (correct) failure to install leads to an abort of the install; but fortuantely, it will work if a silent install is selected (via command-line parameter; we had to issue a TSN, Technical Support Note).

Unfortunately, the 11.x version recognizes no superiors, and thus no tombstone can work against it. Of course we'll uninstall it if found, but if the old first product (with the 11.x component) is installed after a new second product (with the 12.x component and tombstone), the 11.x component will install on top (leading to both versions being installed, which "works", sort of, but is rather strange). At this point the user can uninstall the old one (if they know to do so).

(I am not claiming this is a good user experience all around, although the case of installing old over new isn't terribly common; people rarely use both products, although it happens and we have to support it, just presenting a method of dealing with a problem that we couldn't fix directly since the problem was in the released installer. We're trying to make the best of a bad situation, and point out an interesting technique.)

NAnt: fundamental problems

By David B. Robins tags: Build, Architecture Wednesday, February 13, 2013 13:40 EST (link)

We are in the process of switching our builds from an internal cobbled-together hack (not in a good way) to NAnt, a .NET version of the Java-based (Apache) Ant build tool. The home-grown horror is not too surprising: from a Slashdot article about questions to ask a prospective employer, "Antique Geekmeister" suggests:

"What build system do you use?" Every single workplace that I've seen that built their own from scratch, usually by some brilliant developer frustrated with build systems they never fully understood, spent hundreds or thousands of man-hours on learning it and using it, only to see it fall apart as the developer ran into the same problems "make" and "autoconf" solved more than a decade ago.

I confess, I have done the same thing, although it was by request and at least tiny and got out of the way. And I am not asserting that everyone should stick with make; but at least first understand it. NAnt is looking like "out of the frying pan and into the fire", for several reasons. Some of these may be due to our implementation, but even those I think are the fault of what the build system affords doing, and standard practices with it.

  • NAnt seems to organize around “tasks to run” rather than “targets to build” (dependency support is very weak). I don’t think it’s even possible, for example, to create a traditional rule like “*.wixobj is created by running candle on *.wxs”. This seems like a bad model; it’s procedural rather than dependency-driven. There are better procedural languages if that’s all we need.

  • In traditional build systems, I would expect targets to be based on what is being built, e.g., if I ask for an MSI (Windows installer .msi file) to be built, it should depend on the x86 and x64 MSI files (which would be targets on their own), which in turn should depend on whatever their contents are (whatever binaries need to be included, WiX sources, etc.) and have a rule to build the MSI that updates only out-of-date .wixobj files and links if necessary. On top of that, a setup package target would depend on the MSIs it needs. The usual thing in NAnt seems to be to write a task to build something in a series of steps, rather than organizing it as set of dependencies which self-organize into a tree.

  • Incremental build in this sort of model does too much work; running a build twice should do nothing the second time, for instance, because it would notice everything was up to date. It might run msbuild (which would itself do nothing), for example, but should not redo work already done to create targets that are already up to date. I believe there are some tasks that move files when they should be either built in the right location to begin with or copied—moving a target makes the build system recreate it next time.

  • Possibly a local foible, but dependencies are not properly set up: it’s possible to run targets and be missing dependencies, when they should be dependencies of the target and be built automatically. I can't think of a case where this shouldn't happen. If there must be such internal targets, they should follow a naming convention (e.g., an underscore prefix) and (standardized across builds) “public interface” targets should be created that includes the dependencies (perhaps also delineated in the .build file help or comments).

What would I recommend in its place? Well, you can't really go terribly wrong with make, especially at the level we'd be using it (i.e., very coarse-grained; msbuild does the project build work); I have also mentioned that I'm a fan of SCons. CMake has been recommended to me, and KDE and Clang use it, and it has good cross-platform support; but I don't have much personal experience with it. Are there any other great—powerful yet not overcomplicated—build systems out there (that work on Windows, and don't center around an IDE)?

Parsing a bank statement

By David B. Robins tags: Development, Python, Architecture Sunday, February 10, 2013 15:50 EST (link)

I track expenses in a local database (with a web interface to add receipts), and then I download statements from my bank and synchronize the two using a Perl program I wrote a while ago (mostly matching on dates and amounts since that works better than the name of the account, which can get recorded very differently in the debit entry). I got a bit behind in doing this, and wasn't able to fetch some older statements in the usual Quicken "semi-XML" format that I was used to; I had to fetch text statements.

At first I thought of writing a program to convert the text statements into Quicken (.qfx) format; but then I decided to skip the unnecessary format and parse step and emit YAML, which can be emitted easily by the Python converter and read with only minor tweaks to the Perl reconciliation program.

The text statement parser was built around a stack of line processor objects: each object's main interface was a Process method which could either return a continue (line processed, keep sending), an abort (this processor is done; pop it off the stack and continue with the parent), or a new processor which would be pushed and take over reading lines. This architecture lent itself well to the format of the statement being effectively a set of nested sections and contexts:

  • (Top level)
    • Statement Period (start, end)
    • Checking Activity
      • Withdrawals
        • Checks
        • Purchases

Each higher level calls a static Match method in the child processors, and return the result (a new instance) if it matches, and it begins processing. Multiple sections of each type are possible (in this particular case, each page of purchases is a new Purchases section). They return their data up to their parent objects and eventually to the top level, where it is written out as a YAML document. I wrote up a short SCons (I am a convert) recipe and builder to run the converter.

The original Perl verify script needed very few changes to handle the converted data, and it managed to successfully reconcile with the information in the database.

Standard CPN

By David B. Robins tags: Development Thursday, January 24, 2013 17:33 EST (link)

Recall from the previous entry that CPN is Consistent Prefix Notation, essentially Applications ("Apps") Hungarian.

I was going to present an anatomy of CPN and some standard prefixes and suffixes; but on re-examining the primary source materials I can recommend no better than reading:

  1. Simonyi's original paper, which also contains a few more arguments in favor of CPN, and
  2. this other by Doug Klunder.

(Oh, that's funny: I just saw one of the erroneous examples used in The Art of Readable Code in Simonyi's paper; they only had to copy its meaning, and couldn't even manage that.) As well as explaining the building blocks, those two papers contain plenty of examples of correct naming.

Begin by understanding the structure of a name—[2] uses <prefix> <base type> <qualifier> to describe it, where, since both of the first two are collectively a prefix, another term it uses, <constructor> could replace <prefix>. While CPN is usually thought of as a variable naming convention, it in fact also applies to constants and function names, rules for which are outlined in the papers and should be understood. For example, functions should be CamelCased and their first syllable should be the return type, with (noteworthy) arguments at the end. Conversion functions should be of the form "XFromY", e.g., "RcFromPts".

The only danger point, and perhaps what lead to creation of the Systems Hungarian beast, is to fail to read through the papers completely and with attention to detail. For example, in [1], skipping to table 2 will show many examples such as "pX" and "dX". Without understanding that the "X" is a CPN kind—like a "doc", making "pdoc" (a pointer to some sort of document class), or "ifoo" (index to foo, "pifoo" hence pointer to index to foo) and that "pointer" or "difference" are effectively meta-kinds, a person might imagine that a "pointer" is a CPN kind in itself. This leads to incorrect usages such as "pNext" (pointer to the next what?) or "dWidth" (delta what unit? and if it's an x-coordinate delta, such as a "dxa", width is already implied). Paper [2] groups these "meta-kinds" explicitly (2.1.2), separating "base types" (such as "ch" or "sz") and "construtors" like "p", "rg", or "h".

Another possible confusion are entries for "b" (byte), "w" (word), and "l" (long), and to a lesser extent, "r"/"fl" (float) and "d"/"dbl" (double), which refer to storage types and not usage: however, the authors are careful to point out that "w" on up should be rarely used and are intended for generic routines, e.g., for memory manipulation. C++'s templating ability makes these even less necessary, although one might legitimately wonder what prefixes to use for template arguments since the original documents were written primarily with C in mind (see below for recommendations). "b" gets much more frequent use since it is legitimate for buffers ("rgb", "pb") and their sizes ("cb") even though it should not be used for general quantities. The other prefixes of this sort should be avoided. Note also that although "b" has been abused as Boolean (a chief culprit is MFC), the correct CPN is "f" (mnemonic: flag).

In particular, do not skip the tables of suffixes, which are almost as important as the prefixes. (Yes, technically they wouldn't fit within "CPN", but they are a part of it nonetheless.) Inconsistently naming indexes into an array or list is an excellent way to create off-by-one bugs; so be careful about "Lim", "Mac", and "Max", and so on, and refer to the handy relationship note in [2] until they're burned into your brain: Min ≤ Mic ≤ First ≤ Last ≤ Most < Lim ≤ Mac ≤ Max.

Variables, even arrays, should be named in the singular: "rgch", or "rgchName", or "vcempManager" (vector of employees that will store managers; see below), not "rgcchs" (definitely not), "rgchNames", or "vcempManagers". The kind of thing is (e.g.) vector of X, and the qualifier qualifies this singular whole.

Finally, I will suggest a few standard names that do not appear in the original papers but apply to common structures or C++ (especially STL) types and usages that I hope will be useful:

NameMeaningExample
sstd::string, CString—local standard string class; since a string is practically a primary type, Huffman allows it to snag the short "s" prefix; "sz" etc. as used for C-style strings are not appropriate. s, sFilename, vcs
vcXstd::vector<X>; rg could also be used since a emulates an array; it is of "array kind" vcfoo
stXstd::set<X>; Pascal strings are pretty rare these days so we take back "st"stfoo, sts
lXstd::list<X>; rehabiltiate "l" from the depths of "long"; and lists are common enough to merit a single character lfoo, lsName
tXtemplate parameter TX template <typename T> … TContainedItem &titem, etc.
xp, yppixel coordinatesxpLeft, dypSpace
xa, yapoint coordinatesyaTop, dxa
cscritical section objectm_cs, csTask
ptpoint, e.g. Windows POINT structptClick
rcrectangle, e.g., Windows RECT structrcDrag, vcrcInval

Regarding templates, an uppercase "T" prefix for template parameters and lowercase for variables is consistent with use for classes suggested in the previous post ("class FormattedSaver // CPN: fmtsav", and when used, e.g., "FormattedSave *pfmtsav"). Depending on context and scope, as with classes, it may make sense to use long names for template parameters ("TContainedItem"), like class names, but use a standard prefix for instances ("titem").

I have seen "r" used as a prefix for references, but would strongly discourage it: something being a reference is about type, not kind. One may make the same argument about pointers, perhaps, but it does help to be reminded of the required indirection. In the past, "lp" and "hp" (long and huge pointer) were used for different types of pointer; and they were also about type to a large degree but also had reminder value (I believe the compiler would have warned if they were used in the wrong place, though). "lp", of course, still plagues MSDN entries for the Windows API to this day, even though it only remains for historical reasons. A handle ("hX"), though, is a matter of kind, since it is meant to be an opaque reference to an object which can usually only be handled through specific APIs. Handles in Windows are also frequently of the same compiler type ("void *"), so denoting different kinds has the same positive effect as noting kinds of numeric quantities.

The following scope prefixes can be useful, too (these go before the kind prefix). Local variables and parameters should, following the principle that most common gets the shortest prefix (Huffman style), never have a scope prefix.

NameMeaningExample
m_Xmember variable (private or protected; public members of structs (or classes, but don't do that) should not have a prefix) m_fInited, m_sPasswordHash, m_vcsName
s_Xmodule static variables_log, s_szProviderUrl
g_Xglobal variable (not a constant; constants have no special prefix).g_cchMaxPath, g_cs
t_Xthread-local variablet_vctask
LXlabel (for goto, if unavoidable)LExit, LRet, LSkip
Rehabilitating Hungarian notation: CPN

By David B. Robins tags: Development Wednesday, January 23, 2013 21:30 EST (link)

Many developers' only experience of Hungarian notation is the Windows API documentation on MSDN, which is replete with "dw"s and "n"s and "lp"s and the like, most of which are actually Systems Hungarian. That is, they tell the compile-time type of a variable, and nothing about its use: "dw" is used for a huge variety of bitwise flags (all different), a count of stack frames, thread ID, buffer lengths, Boolean values, time in milliseconds, coordinates, indexes, you name it. Charles Simonyi's naming convention was never meant to do this. His ideas got corrupted moving from Office to Windows, and so his actual concept is now distinguished by the name Applications Hungarian. Joel Spolsky writes about the distinction in Making Wrong Code Look Wrong, which anyone that is criticizing Hungarian should have read at a minimum. Most critics just go around strawmanning Hungarian using Systems Hungarian as their straw dummy: "LOL, prepending type information! That's silly, my editor can tell me the type of any variable." Joel points out:

In Simonyi's version of Hungarian notation, every variable was prefixed with a lower case tag that indicated the kind of thing that the variable contained. I'm using the word kind on purpose, there, because Simonyi mistakenly used the word type in his paper, and generations of programmers misunderstood what he meant.

Precept #1: forget Systems Hungarian; learn about Apps Hungarian, and please, for the love of all that is holy, never use "dw", "n", or "lp" as a prefix for anything ever again. You probably want "c" (count), "i" (index), "d" (delta), etc., followed by the thing being indexed or counted or which you are storing a difference of: "cb" (count of bytes), "ifoo" (index into a foo collection), "dxa" (delta-x in twips). "p" is not a complete prefix: it indicates a pointer to something, and, like "i" (index) should be followed by a kind (the thing pointed to), although in small scopes "i" could be used alone. Simiarly, "h", handle, is a "kind prefix" not a kind in itself. If a document class is represented by "doc", then you can have a "pdoc" or "hdoc", or if you need to distinguish, "pdocOpen" and "pdocTemplate", but not "pDoc" ("doc" should be part of the kind, rather than describing the thing of that kind). This error is frequently seen with Windows handles ("hInstDll" instead of "hinstDll", for instance, or "hCurrentThread" instead of "hthreadCur").

C and even C++ do not have strong typing for numeric types—you can typedef XA and YA and use them religiously but the compiler won't stop or warn you about adding them to a plain integer. Perhaps if it did, such naming would be less important, at least for the purposes of not mixing the wrong kinds. Precept #2: consistent prefix notation helps you separate distinct quantities.

Even having understood that Systems Hungarian is a corruption and Apps Hungarian was always the intent, some people still have trouble with the concept; and then they also strawman it, complaining that Hungarian dictates that people use alphabet soup naming which obscures rather than clarifies code; and they come up with gargantuan monstrosities that they claim "must" be used, which is no more true than the existence of awkward and cumbersome English phrasings necessitate their use, even though they are part of the language. Precept #3: notation is meant to serve man, not man notation. Shorten as convenient: call something ‘mp' instead of ‘mpfoobar' if it makes sense in context. It's meant to work for the development team, not chain it.

Another benefit of a prefix naming convention is consistent names for things. Given a text element class, prefix naming would call for putting a comment with its Hungarian prefix next to the class—if it's really fundamental to the application, "el" would be appropriate, following the idea from Huffman coding that most-used objects should have the shortest names. A pointer to an element is of course "pel", and if not clear from context, more specifically "pelNext", "pelMatch", etc. Yet instead one might see a variety of names (many containing Hungarian done badly): "pElement", "pEl", "ptrEl", "elementPtr", "pCElement", "l_pElement", etc. That's obviously not too terrible to understand but multiply it by thousands of classes and it starts to take time, both in reading and writing code. Precept #4: set standard name prefixes for classes and kinds. Prefix notation is a way to achieve consistent naming; and it makes excellent sense to put the kind first followed by an optional descriptor appropriate for the context.

In Word, classes used to be named with their Hungarian name, e.g., something like "RTPRN" for "Render Target (printer)", with an instance being "rtprn". This made it easier to know the correct name for variables of this kind. Before I left we were moving away from that to use a longer name, perhaps "PrinterRenderTarget" in this case, but with the Hungarian name as a comment next to it. Source Insight, the editor many of us used, would show the code next to a declaration in a lower window when it was typed, so it was easy to see the comment when one was declaring a variable of a given type. This is a "best of both worlds" case where classes can be named verbosely, yet instance variables can be named in a more compact and readable manner. Precept #5: Using consistent prefix notation doesn't mean sacrificing readability.

In The Art of Readable Code (Boswell and Foucher, 2012), the authors almost get it: they suggest useful prefixes and suffixes for kinds (e.g., units, or Joel's example of safe/unsafe strings) but not any sort of consistency; and a suffix is entirely the wrong place to put something this important. For alignment with like kinds, and because of the fundamental importance of the kind to the variable, it must be first. Their "Hungarian" example is pretty bad, too; only one of four examples is correct Apps Hungarian ("cch"). Yet the fundamental is there: clearly and consistently indicate the kind of object or quantity in its name.

"Hungarian" is pretty well known as a term, yet all the wrong things are known about it. Along with rehabilitating its use, we could rehabilitate the name: call it "prefix notation" or even "consistent prefix notation" (CPN). Hungarian notation, that's a mess! But prefix notation? That's just a logical way to name things:

  1. Apps Hungarian is about kinds, Systems Hungarian is about types.
  2. Consistent prefix notation helps separate distinct quantities.
  3. Prefix notation should serve the development team, not control it.
  4. Standardize class and kind prefixes at the declaration point.
  5. Consistent prefix notation doesn't require sacrificing readability.
Boggle solver

By David B. Robins tags: C++, Architecture Monday, January 21, 2013 01:02 EST (link)

I got back to my Boggle™ solver today; I thought about it while playing it with my family over the Christmas holidays and just now got back to designing and coding up an algorithm. Naturally, a trie works well for storing the dictionary, as the possible words can easily be advanced one cube (representing either one letter or occasionally a 'Qu' pair) at a time. Each node in the trie can be marked as terminal if it ends a word, and also has a map of letters to nodes.

The board is stored as a graph: each node has a character ('q' representing the 'Qu' cube mentioned earlier) and a vector of adjacent nodes (order doesn't matter, so we walk the grid of nodes and add everything within the board space (4x4 is normal, but we will accept any size via command-line input) that is in [-1,+1] in the horizontal and vertical direction, except, of course, for (0,0) which is the node itself). Each board node can also be marked as "used" or not in the current traversal.

This set-up, as Fred Brooks points out ("Show me your tables, and I won't usually need your flowchart; it'll be obvious."), is enough to suggest an algorithm: visit each node in the grid, and then its neighbors, and follow the dictionary trie in parallel, stopping when the trie ceases to yield valid prefixes and outputting words (to a set to collapse duplicates) when at a terminal point in the trie. This works and is quite fast. C++11 auto and the new foreach construct was used a great deal for iteration. Honestly, I don't think I'll tweak it much more since it's a sound design and does what I set out to do, so getting it faster at the expense of complexity isn't a goal at this point.

I finished it out by sorting by word length (when the dictionary, from /usr/share/dict, is added, no words under the minimum length are added to the trie, so they are already at least as long as required) and then alphabetically, scoring each word, and adding up the total. Some random real boards (perhaps at some point it could generate its own board using the standard frequencies) have generated a number of 8-letter words and scored over 200 points. Much depends on the dictionary; the present one includes some invalid words (abbreviations, mainly). I used the one I had, and dropped anything that wasn't all lowercase letters (no apostrophes, hyphens, or capitals).

Also for consideration: how difficult would it be to reconstruct the Boggle™ board from a list of words? Certainly there would be many boards for, say, one or two words only; and even if the entire list of possible words were available there may be too few on particular boards (or the dictionary too limited). But I expect that with most regular games one could narrow down to one board (to rotation/reflection of course). There's not much online about it; I just found a thread about it for a 3x3 board that didn't discuss algorithms at all. One would have to gather data about possible boards (e.g., a 3-letter word starts at any position and the following letter may be in any surrounding square and the last in any square surrounding any of those), reducing the possibilities as more words are added.

Source: boggle.cc.

Your build shouldn't move files

By David B. Robins tags: Build Wednesday, January 16, 2013 18:50 EST (link)

If your build process (make, SCons, Ant, NAnt, CMake, whatever you use) moves files, it's almost certainly doing the wrong thing.

First, the file should just have been either placed in the correct location by whatever built it, or be read from the target location.

But beyond that, moving the product of a build (let's not even get into static files), an indicator and result of work done such as compilation, linking, calculation, etc. tells the build system that the work hasn't been done, and that next time it is invoked, it should redo the work. Better to just copy the file, if you really need it to be somewhere else and can't produce it there (or create a link).

Removing (and moving is removing on one end) a built file subverts the build system. In a properly designed and working build system, if two builds are run consecutively (without a clean step), the second should do no work at all: everything is already built, so it just needs to verify a few timestamps or checksums. Yet there are way too many builds out there based around ignorance (I don't just mean old technology, because make has been around for ages so that's no excuse): batch files and other brute recreation.

Doing no work in a second build is just a special case of doing as little work as possible to maintain a target, making for quick incremental builds.

Please: don't subvert your build system.

Get a user's temporary files folder given a token

By David B. Robins tags: C++, Windows Monday, December 10, 2012 22:06 EST (link)

The service I'm working on, running as the SYSTEM account, needs to write files to a temporary location. It was using the SYSTEM user's temporary folder, via GetTempPath, which usually resolves to C:\Windows\Temp; but (as I found out) users don't typically have access to that folder. So, given a token (HANDLE) I had to find that user's temporary folder, i.e., something like C:\Users\UserName\AppData\Local\Temp, although it varies a little by Windows version.

At first I looked at the SHGetKnownFolderPath family of functions, although I could only get the local application data path and would have to append Temp to the result, and I wasn't sure how portable that would be (if the folder always existed, always existed there, or if the name was localized). But it didn't work anyway: even though I succesfully invoked ImpersonateLoggedOnUser, it still returned the SYSTEM account's temporary folder. I had a hunch it had something to do with "mounting the user's registry hive", which led me to LoadUserProfile, but that was a dead end: in my particular case, the process didn't have the necessary privileges (SE_RESTORE_NAME and SE_BACKUP_NAME), so it spat back ERROR_PRIVILEGE_NOT_HELD.

Fortunately, I started looking at various related functions, came across GetUserProfileDirectory, rejected it because it only returned the base path (C:\Users\UserName), but found the nearby ExpandEnvironmentStringsForUser, fed it %TEMP%, and got back what I needed—without even needing any impersonation. Some related posts suggest to make sure the token used (from OpenProcessToken or however obtained) has TOKEN_IMPERSONATE, TOKEN_QUERY, and TOKEN_DUPLICATE for this to succeed.

Celery: it does what it says on the tin

By David B. Robins tags: Python, Architecture Friday, December 7, 2012 18:00 EST (link)

I've been very favorably impressed with Celery, a "distributed task queue" for Python (also happy that it supports Python 3). It is built on top of various queue providers—I use RabbitMQ, and offers high level interfaces and servers. For example, functions can be annotated with @celery.task and they become callable through the distributed API. Modules providing a (user-defined) API function as both an interface (much like a COM IDL file) and as the implementation, run under the Celery daemon or another server. Celery handles parameter transport: changing a function call to a queued request and de-queuing it.

So far, I haven't had to look at the source. That's really great. Many Python modules require examining the source to see how they "really" work; documentation is poor (Perl has a definite edge here with CPAN's online documentation and perldoc.perl.org). It just works; I followed the tutorial, read the API docs, and have a server I can easily call from the Python shell; yet I can still get to advanced routing and server control via well-defined configuration files and options. Kudos to the developers; I looked at a number of (more direct) queue APIs, and this has the combination of using the robust/high throughput RabbitMQ yet providing a pythonic wrapper enabling higher-level design.

/ previous entries /

Content on this site is licensed under a Creative Commons Attribution 3.0 License and is copyrighted by the authors.