I’ve been writing some code to drive the INSTEON PowerLinc Modem. It’s not the newest generation of INSTEON interfaces, but it’s the one I have in my system.
It’s been kind of a miserable experience. This thing could serve as a textbook example of how not to design a serial protocol.
Messages from the modem always begin with an ASCII STX (“Start of Text”) character. So far so good. So, you’d think the message would end with an ETX (“End of Text”). But no. They just… end. There’s no end-of-message marker of any sort. There’s no message-length field. The only way to know when the message has ended is to look up the message-type, which follows the STX, and deduce the expected number of bytes from that. Crappy. So now even the very lowest layer of my driver has to have detailed knowledge of the protocol.
So, what if you’re out-of-sync with the message stream somehow? For example, your program just happened to start up in the middle of an unsolicited notification message. How do you get in-sync? In a well-designed protocol, you would just wait for the first STX. That marks the start of a message, right? But no. The protocol has no provision to escape STX characters that happen to appear by chance in the body of a message. So the first STX character you see might not be real. So now you have to write a whole lot of ugly code to figure out if an STX character you see is followed by a message-type byte that makes sense. And you have to hope that reasonable-looking message-type byte wasn’t fake too, because if it was you might end up waiting for a long time for enough data to fill out the fake message you’re expecting. You need to implement a time-out too, to guard against that.
Whenever anything goes wrong (either an unreasonable-looking message-type byte, or a time-out on getting to the end of the fake message), you have to just discard a single byte (the fake STX), and then try the whole thing again. Because the very next byte after the fake STX might have been a real STX.
There’s no guarantee that this process won’t result in your lower layers sending your higher layers a potentially unbounded series of garbage messages that happen to have a reasonable-looking message-type, and enough data bytes, but are otherwise complete gibberish.
ACKs and NAKs
Commands that you send to the modem get echoed back to you, followed by an ACK or NAK. Usually they get echoed back verbatim, and you can just ignore the whole thing except for the ACK or NAK at the end. But for certain commands, the echoed response is slightly changed, or slightly longer.
There’s nothing like a checksum defined in this protocol. What a NAK means is that the modem was too busy to handle your command, and you need to resend it a bit later. A primitive sort of flow-control, but OK, we can work with that. Or can we?
No, it’s not that simple. Actually, NAK doesn’t always mean “I’m busy. Ask again later.” In at least one case I’ve found so far, it means something else. For the “Get Next Link Record” command, NAK can mean “there are no more link records.” So now, if I send the “Get Next Link Record” command, and get a NAK response, what should I do? Should I resend the command, or should I assume there are no more records? Who the hell knows?
From what I’ve read of the INSTEON power-line protocol, it struck me as very cleverly designed. They seem to have thought of every weird corner case, and accounted for them all. How could those same people then proceed to poop out the haphazard, poorly-thought-out mishmash that is the PowerLinc serial protocol?