This may be the best talk out of 28C3 this year. I was actually more pumped about Cory Doctrow’s “The Coming War on General Computation” 28C3 talk from the previous day, which I shared enthusiastically on G+, but there is more to talk about in this one. It is mostly coached as language/computational theory, but the thesis is that one shouldn’t design protocols in which one is able to construct a message that causes the recipient to perform arbitrary computation in the process decoding of the message. Which is awesome, and their argument for it is convincing. Furthermore, things with the message “Everyone needs to start thinking like language geeks and compiler writers” are bound to appeal to me. That said, I have a couple problems with the talk.
The first problem is purely aesthetic, and mostly unimportant. In terms of presentation, it wasn’t that great a talk. The slides were bland and repetitive, and the speaker kept using problematic mannerisms. The sewearing and such are right in place, but the coughed interjections and such were not good, and the flavoring particles were excessive. I’ve been guilty of most of the above, most of the times I’ve given talks, but the more I teach and speak, the more I become sensitized to presentation, and the internet has made me spoiled on talk quality, with things like fail0verflow’s Console Hacking 2010 at 27C3 last year, or any talk Lawrence Lessig has ever given. On a better note, the Occupy + rage comics visual conceit used throughout is pretty fun.
With that out of the way, on to the techically interesting stuff:
I think they introduce some fundamental problems in demanding context-insensitive protocols. I’m likely misunderstanding, but from working with simple serial protocols, I’m wary of anything that smells like control characters.
Two conceptual problems: indefinite message length, and unwanted control characters. Both arise from the same discussion of automata their thesis is rooted in. The first problem is simple to explain: it is easy to have unbounded input – a message with no stop character will eventually break shit. In practical implementations, message lengths would necessarily be bounded, and part of the problem would go away, but it would still be extremely vulnerable to flooding. They used S-expressions as an example of a reasonable solution – which makes me think “while true; do echo ‘(‘; done”, now you’re DOSed. This could probably be worked around, but it harms the elegance.
As for the second, I don’t see a similar way out. They correctly note that escaping is not a solution, and refer to the delightful field of SQL injection as proof by example. Then they neglect to suggest a different solution, because as far as I am aware, there isn’t one. Given arbitrary data to be transfered, there ARE no delimiters which cannot appear in the data. It’s one of those time-honored intractable problems in CS. The question asked late in the video about badly formed CSV files was poking at the same idea, and they did a great job explaining why field lengths are unsafe, but I’m still unconvinced that there isn’t a fundamental flaw in in-band start/stop characters that is similarly bad. This will require further reading.
My other technical problem: The speakers kept using YACC/BISON as examples of good programming tools in a talk mostly about problems with “leaky” specifications and implementations of things which are fundimentally recognizers. YACC and its ilk are among the worst offenders in this regard. The biggest problem with YACC and imitators is that they require a separate lexer specification, and all kinds of bad things happen when the specifications inevitably don’t quite match. Also, the generated LALR parser breaks when you embed actions, so all your new safety from generating a monolithic parser from a proper language specification goes away. There are better recognizer tools, in terms of ease (and precision) of specification and quality of the generated parser. Personally, I drank the ANTLR cool-aid for that – single specification for the recognizer, no problem with embedding actions (LL(*) instead of LALR), AND spits out parsers in far more languages than any YACC or Bison version I’ve seen.
As an aside, I had independently found and read through the speaker’s old livejournal/blog and some of their research work, without assembling that they were the same interesting person (last paragraph) until now. I also hadn’t associated the identity with her late husband, who was also an interesting person. The computing community is small and close, and it is equal parts amazing and discomfiting.
Now it’s almost 6:30AM localtime, and I haven’t slept because I got interested in something in the middle of the night. What is wrong with me?
EDIT: I noticed that I originally titled this “28C3 Keynote.” It wasn’t. It was the middle of the night. Fixed now.