Some years ago the KAOS lab bought a number of Netgear WNR3500L routers to use as network glue. A pair of them have been in constant service since 2011 in the lab and machine room, and both of those died in the last two weeks. I’ve finally satisfactorily tracked down the culprit, and figure it’s worth a quick write up.
The WNR3500L is a bit of a false-promise machine; it was sold as an “open router” with excellent community firmware support, but it has some pretty extensive blobs that mean it really only runs versions of DD or Tomato with a small range of 2.6 kernels.
I initially just swapped the first one to die out for a spare (which happened to be flashed with Tomato instead of DD, it was an experiment when they were first set up), and reloaded most of the settings by hand (we failed to save or document many of the settings elsewhere. Our fault.) until I had time to investigate. When the second one died, it became urgent.
Conveniently, the WNR3500L has a 3.3V RS-232-like serial port (115200/8/n/1) under an easily removed panel, retained by one small torx screw/pin thing (Both are out in the picture while I was figuring out which side was which).
This leads to another victory for the Bus Pirate. Any 3.3V compatible UART adapter would work, but the bus pirate has much nicer cables for this sort of thing. In retrospect, those headers are correctly sized to just plug the female harness ends directly, but I used clips out of habit.
It turns out they were booting right up with only warnings, but the configuration was sufficiently garbled that neither the WLAN nor the Ethernet ports were coming up. I blew the first one’s settings away (note: at least on the old version these were running, reboot TWICE after clearing the NVRAM to get back to a consistent state: the first time it writes default values, the second time it boots cleanly). With the software-only nature of the problem established, I took more time to investigate the second one, at which point…
[1310 lines excluded]
size: 32782 bytes (-14 left)
Well, There’s Your Problem.
Turns out the tiny little log fields for monthly traffic statistics filled up the nvram and were causing configuration corruption, and thus failures.
There are some forum threads and bug reports on the matter, and it was fixed about two years ago with a test to prevent writing past the end of the file, a default setting to automatically delete traff logs after 12 months, and various other enhancements. Now that we’re up and running again it isn’t desperate, but it looks like there is a DD release from mid-2013 with the fixes applied that still supports these things, and they are probably due to be re-flashed to keep this from happening again.
- My home OpenWRT on a TP-Link 1043ND uses a connected USB key for logging and such. This is a much better behavior (like most things in OpenWRT: DD-WRT appears to be kind of an unmaintainble mess).
- Keep an eye on appliances like routers, they’re still just computers.
- Back up your damn configurations, or at least retain copies of the information required to regenerate them. Regenerating settings, especially when it involves manual tasks like MAC address hunting, sucks.
- These things’ bootloaders appear to always look at 192.168.1.2 through the non-WAN ports for a tftp payload named vmlinuz. There may be a way to disable that, but I think I’d prefer to leave it on, since it only enables local attacks and provides a rescue mechanism.
- The WNR3500L has a documented weakness about WAN-LAN throughput with various firmwares. Our uplink is actually good enough for it to matter, so perhaps one of the newer versions will improve that.