Source: adafruit industries blog
Nerd Ralph blogs about shrinking the Arduino core for ATmega168 and 328 MCUs at 4 to 16 MHz:
One of my gripes about the Arduino AVR core is that it is not an example of efficient embedded programming. One of the foundations of C++ is zero-overhead abstractions, yet the Arduino core has a very significant overhead. The Arduino basic blink example compiles to almost 1kB, with most of that space taken up by code that is never used. Rewriting the AVR core is a task I’m not ready to tackle, but after writing picoCore, I realized I could use many of the same optimization techniques in an Arduino library. The result is ArduinoShrink, a library that can dramatically reduce the compiled size of Arduino projects. In this post, I’ll explain some of the techniques I used to achieve the coding trifecta of faster, better, and smaller.
The Arduino core is actually a static library that is linked with the project code. As Eli explains in this post on static linking, libraries like libc usually have only one function per .o in order to avoid linking in unnecessary code. The Arduino doesn’t use that kind of modular approach, however by making use of gcc’s “-ffunction-sections” option, it does mitigate the amount of code bloat due to the non-modular approach.
With ArduinoShrink, I wrote more modular, self-contained code. For example, the Arduino delay() function calls micros(), which relies on the 32-bit timer0 interrupt overflow counter. I simplified the delay function so that it only needs the 8-bit timer value. If the user code never calls micros() or millis(), the timer0 ISR code never gets linked in. By using a more efficient algorithm and writing the code in AVR assembler, I reduced the size of the delay function to 12 instructions taking 24 bytes of flash.
See the post here and ArduinoShrink on GitHub.