Barnsley fern fractal

Thoughts on software architecture and development, and methods and techniques for improving the quality thereof.

David B. Robins (home)

Code Visions: Improving software quality
I'd like to see my son, please

By David B. Robins tags: Web, Architecture Sunday, November 12, 2017 13:29 EST (link)

We had a large network changeover at work recently, relating to moving from the network of the first company that acquired us to that of the company that merged with them, so all sites are on one network without address conflicts. Unfortunately, the new company's network blocks port 81, and I had been serving static content such as images on that port. With the block, all the images and styles for disappear—something a friend had complained about in the past, presumably because his work had the same block. Lately, most of those images are of our two-month-old son, David Geoffrey Robins. I also pull them to a digital photo frame I made from a Raspberry Pi 3B and Raspberry Pi touchscreen.

I'd write an entry about the digital photo frame, but it's no more complex than the Pi, which I had for my DCC (train automation) work, which has been on hold due to the baby, running Arch Linux, connected to the standard touchscreen I added recently, running qiv (Quick Image Viewer) and some scripts to download and add images from my site when new posts are made.

Anyway, it was time to make an update to serve all content on standard web port 80. (I should note here I'm not trying to circumvent any security measures. All changes are on my end, and they only allow me to view my site, which worked fine for the last several years before the block.) I considered merging the two Apache servers, but a faster method still preserving separate servers was to use a proxy, like I do for A Voluntaryist Wiki, or specifically, a reverse proxy, using Apache's mod_proxy. This is nothing like the kind of proxy used to access general websites that might be blocked (e.g., like Facebook is in China); it's specific to my site, which isn't blocked (just port 81, which is blocked for all sites). I set it up so that any URL beginning with /s would redirect to the static server.

In the past, an image may have lived at

now it lives at

After that change I had to make a few key changes to places in my site's code that had or generated links with the former scheme, and we were back in business… or almost. The Apache ProxyPass directive requires the URL and path to the server (/s and localhost:81 respectively) and there's also a ProxyPassReverse that fixes up redirects, so that if the static server sends a redirect, it gets the /s prefix. I originally used localhost there too; but it instead needs the qualified hostname to be fixed up: in this case. I also used mod_remoteip to translate (proxy only) X-Forwarded-For headers to a source IP for some controls that depend on the origin.

Flash emergency and aftermath

By David B. Robins tags: Tools, Embedded Sunday, September 3, 2017 17:25 EST (link)

Last Christmas we drove up to my parents' place in Canada for the holidays—we alternate years (so the first year with Wee Tiny goes to the Hedricks). Of course, I brought my camera (Nikon D300; I've been Nikon since film and my F90X, and have the lenses to show for it), but it had been a while since I'd used the external flash (an SB-800) and I found the batteries had leaked and corroded the contacts to the point it no longer functioned—horrors!

I did not know if it was simply a break in the circuit, and I'm a long way from most of my tools, but I think I had a multimeter with me, so I could buy a set of batteries I knew worked (reduce one variable), but… nothing. I looked around, and sliced up a Coke can with a set of tin snips to use as a conductor between the battery +/- ends instead of the corroded connectors on the battery door. Still didn't work. Looked around online, discovered that aluminum had a tendency to oxidize and not conduct, roughed it up a little, and finally got power. Realize I'm not looking for a long-term fix yet, just to find if anything will fix it or I'm looking at replacing an expensive flash unit. Turns out the Coke-can shim worked for the rest of the holiday, and I was careful to remove the batteries before we headed back to the US.

When we got home, I thought about a more long-term fix, since every time I wanted to use the flash I had to finagle the aluminum shims in place (not wanting to leave batteries in again). This would matter more when I upgraded to the D850, which had no pop-up flash; and the pop-up is terrible for indoor lighting anyway.

So I had a couple ideas: try to find a replacement battery door—which I did, on Amazon, and although the source looked a little sketchy it seemed like the right component, but not cheap; or maybe solder a short wire to the batteries' +/- together pairwise (it takes 4 x AA) so they can be inserted quickly. I read a little online about soldering to batteries (and learned about the existence of batteries with pre-manufactured terminals), mainly to ensure it was safe. I tried with a dead battery; worked out poorly, the solder didn't want to flow to the battery. I didn't have any flux handy, but filing it rough allowed me to deposit some solder and then connect a tinned wire. So that would have been fine.

But an easier solution presented itself: clean out the corrosion (first with the file, then with Q-tips dipped in vinegar), and if that removed too much material, I could deposit some solder to use as a cap. The solder wasn't necessary, and the simple act of cleaning it out well restored the flash to use.

Startup crash in Windows dynamic loader

By David B. Robins tags: C++, Windows, Tools Saturday, September 2, 2017 11:21 EST (link)

>	ntdll.dll!LdrProcessRelocationBlockLongLong()	Unknown
 	ntdll.dll!LdrRelocateImageWithBias()	Unknown
 	ntdll.dll!LdrpProtectAndRelocateImage()	Unknown
 	ntdll.dll!LdrpRelocateImage()	Unknown
 	ntdll.dll!LdrpCompleteMapModule()	Unknown
 	ntdll.dll!LdrpMapDllWithSectionHandle()	Unknown
 	ntdll.dll!LdrpMapDllNtFileName()	Unknown
 	ntdll.dll!LdrpMapDllRetry()	Unknown
 	ntdll.dll!LdrpProcessWork()	Unknown
 	ntdll.dll!LdrpDrainWorkQueue()	Unknown
 	ntdll.dll!LdrpInitializeProcess()	Unknown
 	ntdll.dll!_LdrpInitialize()	Unknown
 	ntdll.dll!LdrpInitialize()	Unknown
 	ntdll.dll!LdrInitializeThunk()	Unknown

Not much of an entry after so long, but I couldn't find the symptoms of this problem anywhere else (although this had some similar elements, and remained unsolved).

I was doing some testing for a new product on Windows 7 (since we needed to support that, and so far I'd tested on my Windows dev machine, running Windows 10), and it kept crashing on startup in the NTDLL loader code (callstack above). It also only happened with release builds, and I narrowed it one of two (not including OS/runtime) DLLs we packaged, call it x.dll. I copied over a built debug x.dll, and all was fine (of course there are other problems mixing debug and release binaries, but I was just trying to get past this problem on load). Then I copied the release x.dll from the build location (it gets copied to a staging location to be packaged in the installer), and—also fine.

Then I recalled that for the Linux build, we strip binaries (using install -s) to remove unnecessary symbols and sections, and since Windows uses as much of the same makefile as possible (via Cygwin), it also tries to strip Windows DLLs, which is doing the wrong thing and causing the crash. Interestingly, whatever strip does on Windows, it does not change the file size.

Mystery solved, easy fix, on to bigger problems….

DCC operations mode address change

By David B. Robins tags: C++, Development, Trains Sunday, December 4, 2016 16:18 EST (link)

I've reorganized the ddcc code significantly to move even small but distinct parts of the code into their own modules with separate interfaces: the bit queue used to assemble DCC packets, the command parser, packets and packet factories (don't worry, I haven't gone full Javtard; they are just functions bundling parameters into a packet, or more precisely, a std::shared_ptr<Packet>). I had some responsiveness issues with the function (air horn) which I fixed, and idle is sending the last speed command. By default new commands are repeated 5x (should still be < 25 ms total) which works well.

Changing address proved a little tricky. Neither of the decoders I have will allow an immediate address change of the type in use, and I suspect that's general to mobile decoders; i.e., if it's currently using a short address you can't switch to another short address, and same for extended (a.k.a. 2-digit and 4-digit addresses). So address change sends a number of commands (all repeated individually 5x as noted above; even CV17-18 are not alternated).

First, I should note how I handle the two address types: I use a single space, and 0-127 (0 being broadcast) are short, 128+ are long, so "extended address 3" is represented by 128 + 3 = 131 internally. The largest extended address allowed is 0x27ff, which in my single space is 0x27ff + 128 = 0x287f. When I refer to addresses without prefixing "short" or "extended" assume a single-space address.

The sequence to change an address in operations mode (the alternate is "service mode" which is a broadcast and so requires either a separate track or removal of other locomotives from the track):

  1. set unused address type (CV #1 for short, CVs #17 and #18 for extended) to temporary address (127 for short, 0x287f for extended, i.e., CV #1 = 127 if the current address type is extended)
  2. switch address type (set bit 5 of CV #29 appropriately)
  3. set desired address type to new address (remember to send to the temp address!)
  4. switch address type (set bit 5 of CV #29 back to previous value)

Using the CV write-single-bit instruction to write CV #29 worked better for me than sending the whole byte. Of course, if the new address is of the opposite type to current, it's only necessary to set the unused type to new and flip bit 5 of CV #29.

I'm still not reading/verifying CVs. While the LMD18200 H-bridge does report current draw via a scaled current-sense output, the Raspberry Pi doesn't have an ADC. I may obtain an SPI ADC IC like the MCP3008 to do this. (The circuit will also need a resistor to convert microamps to volts.)

DCC Train Control with Raspberry Pi and a Joystick

By David B. Robins tags: C++, Development, Embedded, Debugging Sunday, November 20, 2016 16:46 EST (link)

The DCC conversion I planned two weeks ago is going well. (The DC with PWM did not go as well; one locomotive stalled and pulled 1.5 A through the small H-bridge rated for 1 A, with predictable results: magic smoke let out. Fortunately I had a few more, but I decided to focus on the DCC controller.)

I am using the booster circuit here (scroll down), which is built using a Raspberry Pi to control a TI LMD18200 3 A, 55 V H-bridge (with plenty of over-current and other protection built in), and appropriate smoothing capacitors and so forth. I started with the pcDuino V2 rather than the Pi, but the incident with the H-bridge took it out too, and while it ran for a little while it eventually was no more, and I picked up a Raspberry Pi 3 B at Fry's and also installed Arch Linux ARM on it.

The design uses a daemon which I call dccd to send commands to the track, which listens for higher-level commands over a local network socket (for now, speed values from -127 to +127, 0 to brake, sent to default device 3). It uses the pigpio library to handle GPIO, and Poco (after evaluating a few in the space) for sockets and threading. I was originally using nanosleep to handle timing, but the DCC decoder (Digitrax DH126D) didn't like the signal at all. I took it to work, hooked it up to the 'scope, and saw why: the pulses were all way too long: S-9.1, the DCC Electrical Standard, says that each part of a 1-bit should be 58 µs ± a couple and a 0-bit should be 100 µs ± a lot, and it was 60+ µs off. It turns out that nanosleep means that the kernel can do its thing for a while, and for precise waits I needed something like pigpio's gpioDelay. (I had also observed a persistent ringing on the scope, but that was due to having unsynced grounds: I had used a single probe and connected the ground to one H-bridge output and the probe to another; much better with using the board ground, two probes, and plotting the difference. I'm sure someone with an EE background would never have thought of doing such a thing….)

With the square wave going beautifully, the DCC decoder was almost happy, except I had inserted a 5 ms gap between packets, which was unnecessary; I had my idle loop send the idle packet or re-send speeds. Now I was able to hook up the decoder to the H-bridge outputs and a voltmeter to its motor output leads and see reasonable voltage changes corresponding to speed settings—except it never got beyond about 12.5 V after about half speed (on either 28 or 128-speed scale). I figured it was also doing PWM and my meter wasn't averaging it correctly, and hooked that up to he 'scope too and saw reasonable increases (duty cycle changes) all the way to top speed. Now I was ready to install it in a locomotive.

I have three locomotives, two steam, one diesel. One of the steam had been opened before and opened easily to access the motor, so I decided to convert it first. It was in fact a lot of trouble because the motor, as with many older locos, wasn't properly isolated and connected to the pickups through the bottom of the chassis at several points. I covered a couple with electrical tape and that was fine, but I needed to connect to the wheel pickups somehow. The last connector was a brass screw that passed through a hole in the motor frame, and connected through the floor plate to the wheels. I played with that for a while: I needed the brass screw to give me a connection to a wire I'd attach to it, but not to the motor frame it passed through. Much plumber's tape was expended in an attempt to make this work, and it did for a short while, but was unstable. Nylon screws are a recommended solution too, but I wasn't managing to find any the right size (M1.4?) and I'd still need to connect the decoder to that side's wheel pickups another way. Eventually it must have shorted and blown up the decoder, which never came back (so I'll be taking advantage of Digitrax's excellent "no worries warranty" which explicitly includes "accidental customer damage"). I have on order a newer Hornby motor that I think will fit, an X8259 from Peter's Spares; and unsurprisingly the shipping (from the UK) is about twice the price of the motor. The current motor still works—and I can run it through DCC with long overhead wires like a tram!—but it would be far easier with an isolated motor (that's the right size with the right worm gear).

The next conversion went much better: it was my British Rail diesel engine, 37 130, end identification "8H22", and this video was very helpful. My model had some differences: instead of the whole floor plate separating, floor sections at the front and back with wheels (and motor on one) separate from the top/middle piece. The key was separating one metal motor terminal from a track pickup, and then I was able to solder the decoder leads to left/right pickups and motor terminals. Another difference from the video is that he had a separate (accessory—light) pickup lead he could solder to for the one separated from the motor; I had to loop a wire around where the motor terminal had been attached. I used heat-shrink tubing around each of the connections, and when I was done there was plenty of inside space for the decoder (unlike the steam loco where it was a very tight fit). It worked the first time and kept working. The only issue with that loco is that one set of wheels has become stiff and has trouble around curves; a colleague at work suggested graphite and I'll give that a try (plastic on plastic, so oil isn't the best option).

Finally, telnet'ing to the DCC daemon, dccd, wasn't an ideal way to control the train, so I did some work on the other component, Dave's DCC controller (ddcc): I fired up evtest and looked at joystick (Logitech Extreme 3D Pro) events, and wrote a short program to read events from the Y-axis, scale them to -31 to 32 (for now, full speed is unnecessary) and pass them as speed requests to dccd. This worked exactly as planned and I was very happy with it. I may use the joystick's throttle control for speed later, so one doesn't have to hold it in position to keep it moving. It also needs a GUI to be a little more elegant; perhaps Qt (Tk is so ugly)?

PWM motor control on the pcDuino (V2)

By David B. Robins tags: Development, Embedded, Debugging Sunday, November 6, 2016 13:35 EST (link)

It's been a year since my last public post here; way too long; but so it goes. It's certainly not from lack of material; perhaps I'll backfill some.

We are starting to build an HO-scale model railroad layout; it really is quite early, but rather than starting from scratch I'm starting from a box of ~25-year-old supplies, including a diesel and two steam locomotives, various cars, track, a power supply, and various scenery elements. We purchased an 4' x 8' x 1/2" birch plywood board and have assembled a simple oval and can drive the engines around it (much better after cleaning the track and wheels with isopropyl alcohol).

Eventually I'll do a DCC (Digital Command Control) conversion, and to that end I've ordered DCC decoders for the locomotives, and dusted off a pcDuino V2 I had in storage and installed Arch Linux ARM on it (with LXDE) so I can use it as a controller. This weekend I thought I'd set up the pcDuino to control the existing DC circuit, since I had H-bridges (TI's SN754410)s on hand (used earlier to test automating solenoid turnouts; worked fine, best with 12V+). (Since Arrow has had free shipping for October and November, I've been buying whatever I could from them, although I much prefer Octopart's search engine—which can be set to show parts from Arrow only.)

The old DC controller puts out a little over 14V DC at max throttle, so it became the output supply (Vcc2) for the H-bridge, which I set up on a breadboard. GPIO pins from the pcDuino for motor contol (1A, 2A) and enable, outputs (1Y, 2Y) to the track, ground and 5V from the pcDuino, checked passthrough, all good. I used the /sys/class/gpio interface for the GPIO pins; one gotcha was that it uses the Sun4i pin numbering; see extension headers, and calculate (letter - 'A') * 32 + number, e.g., J11 header pin 3 is PH7 which is ('H' - 'A' = 7) * 32 + 7 = 224 + 7 = 231; echo 231 > /sys/class/gpio/export creates /sys/class/gpio/gpio231 and echo out > /sys/clas/gpio/gpio231/direction makes it an output, and so forth. There was supposed to be a /sys/class/pwm with a similar interface—but it wasn't there!

That began a long hunt for how to make the PWM device work. The relevant kernel module was pwm-sun4i, but loading it didn't help. That lead me down the rabbit-hole of device trees, and I found I had to make some changes to the .dts file. On Arch, /boot/dtbs stores the .dtb files, compiled .dts files; dtc can "decompile" them. Since Arch Linux ARM didn't have a pcDuino V2 boot loader, I was using the V1 (only V3 is listed on the page, but the V1 files exist in a "pcduino" directory next to "pcduino3"). I override the .dtb file in boot.txt (compiled with mkscr) using setenv fdtfile pcduino2.dtb. I produced that .dtb by decompiling /boot/dtbs/sun4i-a10-pcduino2.dtb, editing the .dts, and recompiling it. First I removed status = "disabled"; from pwm@01c20e00, and that gave me files in /sys/class/pwm, but it still didn't work. Further reading in the A10 manual and other places suggested that the pin needed to be put into PWM mode, and wasn't. That lead me to determine I needed to add pinctrl-names = "default"; and pinctrl-0 = < &pwm0 &pwm1 >; to that section too (and I also added status = "okay"; just in case), with pwm0: and pwm1: labels placed in front of pwm1@0 and pwm1@1 respectively. I had added some debug logging to the relevant kernel modules to ensure that the "pinmux" commands to set the pin mode were being sent. Building a kernel on that machine is very slow, and I used distcc to offload some of the building to a cross-compiler on another machine.

The last surprise was that setting the PWM duty_cycle to 0 actually meant full 100% power all the time. I read that a 1 kHz period was good for a DC motor, which means echo 1000000 > period (since it's in ns). This mostly worked; slow speeds are a bit dubious and I may experiment with different period values. Also it is problematic to change direction while the enable is being pulsed; I think this is what killed my H-bridge (loud pop, reset the DC source's transformer), but fortunately I had more.

I want to control the eventual DCC controller with a joystick, a mostly-unused Logitech that I played TIE Fighter with once upon a time; so I'll set up a basic user interface for the current DC controller to learn how the joystick events work.

Asix (Ethernet over USB) Linux driver patches

By David B. Robins Saturday, October 17, 2015 12:19 EST (link)

Exacq's UNA (preview video posted to LinkedIn and Twitter) is a new video recorder with 8 or 16 built-in PoE ports for IP camera connection. Combined with auto-discovery and addressing, it makes for a great "plug and play" experience for businesses looking for video security.

Unfortunately, the driver for the Ethernet-over-USB device used, Asix's AX88772C, crashed our Linux test system during an "extreme debugging" session, leaving only a stack trace (photographed before the watchdog reset the system), and we haven't been able to reproduce it since. Fortunately, I was able to work backwards using the trace and the module source to determine what must have happened, and I made and tested a fix and submitted it to the kernel mailing lists (my first kernel patch). It was accepted; however, maybe a day later someone else submitted a set of patches superseding mine (since I don't believe in coincidence, I think at least my post caused him to post changes he'd been sitting on). I urged the maintainer to accept the 5-patch set over mine since it fixed the issue and also cleaned up the code in other ways.

The original issue was that if a memory allocation for a network buffer (sk_buff, or "skb") failed, a flag set earlier and not cleared properly in this case would cause a later call to assume the buffer had been allocated, and pass it to skb_put, which would dereference it and cause the crash.

Wanting to use what would (eventually) become the mainline kernel code, we tested the 5-patch set but it caused horrible throughput issues (down from tens of Mbit/s to tens of kbit/s). We then scaled back to my original patch, which seemed to work for a while, but then we found a machine that reliably crashed every few hours in the same part of the Asix driver. Investigation showed that this was a different issue related to how the Asix hardware passes data to the driver over USB. It includes a 32-bit header, which has some status bits and two 11-bit values specifying the size of the (network) packet, one bitwise inverted and used as a check. Due to other hardware/BIOS issues, we were getting a lot of bad data on that particular machine (tens of thousands of bad packets per hour), and with an 11-bit check, 1/2048 random 32-bit values will match by chance. This lead to accepting invalid size values that doesn't match the actual packets being sent, and if the size in the header—used to allocate an skb—is smaller than the actual packet, adding the packet using skb_put will cause an overflow that stops in skb_over_panic.

I wanted to take the 5-patch set and see if it fixed both crashes, so I examined the throughput problem and determined it was caused by patch 4/5. The "fixup" handler, asix_rx_fixup_internal, was intended to take one or more USB data packets and return a network packet; since it could be called multiple times for one network packet, it saved some state in the driver private data area, such as the remaining (network) packet size. If called with a continuation packet (i.e., remaining size > 0), this patch checks to see if the supposed continuation packet begins with what looks like a header (with the aformentioned 32-bit size/not-size), and if it did not, it would assume the continuation was actually the start of a new network packet (and the USB layer had gotten out of sync or lost the expected continuation). This is completely backwards: it would make some sense to assume it was a new packet if it did look like a header, but even then with a 1/2048 chance of any random continuation "looking like a header" even then it's dubious. And it easy to see why it affected throughput: any "continued" packet, which might well be most of them, would be lost—only small packets or those 1/2048 whose continuation "looked like a header" would get through!

I ended up removing that part of the patch set, and adding another check for the earlier mentioned size mismatch causing an overflow in skb_put; it had no throughput issues when we tested and has been stable thus far. I have contacted the author of the 5-patch set, but not received a reply; if he does not get back to me I will post to the appropriate kernel lists with my findings and suggesting not taking patch 4/5 and adding an additional check for skb overflow.

State of the hotel mobile key world

By David B. Robins Sunday, October 11, 2015 11:44 EST (link)

There are, I found recently, a lot of companies with hotel mobile key solutions (i.e., sign up via an app earlier, bypass the front desk on arrival, open your hotel room door with your mobile phone). This doesn't even include the residentials, but I wouldn't be surprised if some of them are planning to enter the hotel market too (at least one of them already did). When I did search, I didn't find anywhere that had gathered together the current slate of competitors, so I've done my best to list them all together here, with links to each's solution:

(I should mention that I currently work for Tyco, although I've never even been in the same room with anyone involved in their work with Assa Abloy, and that I have worked and consulted for Hilton and Yikes in the past. I'm just gathering and providing information I've found online out of professional interest in the subject, and am attempting to provide an objective summation.)

Numbers-wise, the hotel chains might have it, although many hotels are franchised rather than company-owned and franchise owners may be able to opt out of the system (although if costs are reasonable the large-scale adoption of the system should mean their signing on would be low risk); second, the lock companies, for smaller chains and independents, and then the startups get to fight over the scraps. It's helpful that lock companies have realized that hotels are not usually going to want to upgrade all their locks and have provided options supporting existing locks similar to the startups' in-door dongles, but possibly less disruptive if they fit inside the lock itself: in particular, Assa Abloy, makers of the popular VingCard Classic, Signature, and Essence RFID locks, offers an upgrade by "[adding] a small Bluetooth Low Energy board to the lock" and also offers an upgrade path for their older locks.

Value-adds that might make a hotel owner consider doing business with one over another would include things like having a full solution with a "cloud" web portal to set up accounts, PMS (property management system) integration, ability to brand ("skin") the mobile app with their hotel brand, full installation, and technical and system support (24/7, if possible). But it looks like most of them already do all that. And in case of the smaller companies failing or the larger companies discontinuing their mobile key project (i.e., if a clear "winner" emerges), having the plans and code in escrow against such a possibility might be a requirement too. Note I didn't mention "security" as a feature, because it's more of "if you don't have it you're dead" and it can be laughable to see claims about number of bits or "the lock provides half of the security key!" (paraphrase) or other buzzwords that really mean nothing at all except that the blurb writer didn't have anyone that understood security to check with.

It would not surprise me if hotel owners are very much taking a "wait and see" attitude toward mobile key—to see which system or systems become dominant: let's face it, they can live without it and it's very much a novelty at this point. Almost certainly any electronic lock system that they have will work with at least one system, so there's no need to pay for a costly replacement of all of their locks. A New York Times article agreed, noting "Investing in a technology that might become obsolete or never take off is one of the risks hotels face in building a mobile key program. This is one reason not every brand is climbing on the mobile key bandwagon."

As the landscape evens out, a mobile key solution is becoming commoditized, which means that rather than competing on features, the providers are competing on price in a race to the bottom—an unenviable position to be in especially if that's your primary line of business. Profits are lower, and it's harder to recoup investment. It's not such a big deal if you're Assa Abloy and a mobile key solution is just a value-add, a catchup, something you might roll into the price of the lock and maintenance agreement; but it's a little like Microsoft crushing your startup by accident when they add a feature.

There are also a number of residential solutions in varying states of function: of the above, Unikey/Kwikset's Kevo is shipping; the rest don't appear to have shipping hotel solutions yet; others have done better at listing these and they aren't the focus: August, Lockitron, Schlage, Goji, Yale.

It will be interesting to see how this settles out; and unfortunately for sellers, hotels may take the same position, with adoption (outside of brands doing their own) being slow to start and then picking up as a direction emerges. And please let me know if I missed any other solutions out there!

Over the air debugging

By David B. Robins tags: C++, Python, Tools, Debugging Saturday, April 11, 2015 15:57 EST (link)

I had the idea a few months ago to debug our embedded device application over a Bluetooth (Low Energy, BLE) link. Three weeks ago I finished over the air firmware updates, and this last week I got a chance to try my over the air debugging plan; and as it happened it worked fairly well.

We use SEGGER's J-Link GDB server to debug our application over a USB tag cable. I figured that I could write a GDB server that could instead use Bluetooth to communicate. It would be able to leverage GDB for symbol lookup and examine and write to memory although not (at least at first) run and step (stopping execution would terminate the Bluetooth connection). I looked up the GDB remote serial protocol, but as it happened an open source Python GDB server project already existed: pyOCD. I made some fixes to make it work with Python 3 (and stop obnoxiously logging to console), and wrote BLE transport and target classes.

When the pyOCD GDB server is launched, it runs in its own thread. My implementations of the read and write memory functions (readBlock32, readMem, writeBlock32, and writeMem) leveraged my existing ayncio-based tools framework that communicates to our embedded device over BLE using Bluegiga USB dongles. The functions use loop.call_soon_threadsafe to invoke the BLE communication to read or write on the main thread, and wait on an event until it completes.

This is of course a dangerous facility to leave enabled on shipping code, which is why it is (until we have better general security) enabled only under #ifdef DEBUGGER; our builds are automatically tagged with these feature defines so it will be clear when it is enabled.

Having this over the air memory debugging support is more convenient than having to attach a cable, and allows doing some useful debugging of state at remote locations too. More importantly, usually the case of attaching the normal debugger disrupts the device state (losing any information that could be obtained), or after attaching gets disrupted by static, and this will not. One of the current issues we're tracking has been very hard to reproduce, and we don't have enough physical debuggers to attach to every lab device. This new facility will be a great aid in resolving down such issues.

The compiler optimized out my destructor (and it was right)

By David B. Robins tags: C++, Bugs Tuesday, February 10, 2015 12:02 EST (link)

I do most of my testing with a debug build, which has fewer optimizations than release to make debugging easier (not "none", because that makes the binary image too big for the embedded device; -Og, actually). I was doing some pre-release testing with a release build, and noticed that the destructor for one of my objects didn't appear to be called in release builds (using -Os). This object was somewhat of a special case, since it was created with placement new and destroyed with an explicit destructor call; and I always had a faint feeling of not-quite-rightness with it, and now I know why.

The destructor set a state in the object to mark it as free, since it was one of several slots that could be activated when associated with a Bluetooth advertisement or connection. This is where you should be having uneasy feelings, because looking at destroyed objects is not kosher at all; in fact, it is undefined behavior (C++ standard N3690, 3.8 Object lifetime [], paragraph 5). Thus, the compiler reasons that if you are not allowed to look at the object's fields, then there's no point in wasting good CPU time setting them! So, the state was never set to free, and (in this particular case) the advertisement never expired; although this defect was hidden by another issue for a long time.

Fixed by using an explicit Free function.

/ previous entries /

Content on this site is licensed under a Creative Commons Attribution 3.0 License and is copyrighted by the authors.