Much better. But 17 seconds is still noticeable, can we go even faster? There’s the flash erase time, which is going to be the ultimate limit, and it looks like that is around 2.6 seconds, but how close can we get? Decent adapters like FT232R can go up to 3000000 baud and PL2303 can go even higher. So why doesn’t it work? We don’t know. ESP boot loader performs baud autodetection, and it just can’t recognise higher rates. I could never make it work with a rate higher than 921600, and even that is unreliable when actually flashing (looks like it picks a divider that is not quite right). The code is in ROM, so even if we study the ROM dump and get to the bottom and find the bug, much like with the erase issue, we are unlikely to be able to fix it.
We know ESP UART is definitely capable of higher rates – it is clocked from a 26 MHz source and can be configured with an arbitrary divider. At lower values the data output is not a square wave, so 52 Mbaud is unachievable in practice, but 3(-ish) or 4 Mb it can certainly do. So the only problem is configuring with a correct divider and we can all but eliminate the transfer delays.
But how to get our code running on the device so we can control the UART? The answer is already contained in esptool – it uses a custom bit of code to read flash contents. Called SFLASH_STUB, it is a compiled chunk of (presumably) hand-written assembly code that will read and send the contents of a specified region of flash to the serial port. While the code itself is useless to us, from the implementation of the read_flash command we learn how to upload and execute custom code. Now the only problem is writing it.
The initial plan was to write just the bit that changes UART baud rate and jump back into stock flasher. But then we thought about other improvements to flashing that would also be nice to have. For example, checksums. Stock flasher uses a 1 byte XOR as a checksum which we would not be inclined to trust, especially at higher transfer rates we’re looking at. Also, if the change to firmware is small, it would be nice to avoid rewriting all of it and only rewrite the changed parts. But reading out the contents first is a non-starter, it’s be a pure waste of time if it turns out that in fact we need to rewrite most of it. However, reading flash without sending data out to the serial port is very fast, so we thought an rsync-like mechanism, where only digest of existing data is sent, would be ideal. This, however, means that we need to write a substantial amount of code.
Assembly is not a great language to be working in, however. C is much more preferable to us. Can we develop the SFLASH_STUB-like snippets of code in C? With some reserch and a custom linker script, it turns out, we can. You can find our “stub development environment” here.
To make an already rather long story short, we ended writing a custom flasher stub, which can perform all perform erase (minus the bug, so no more workarounds!), read, write and MD5 digest operations (CRC32 would probably be enough, but there is already MD5 implementation in ROM, so we just use that). When it starts, it explicitly sets the speed to a value that is passed as a parameter, no autodetection is required. It also continues to receive data while writing flash and does poor-man’s flow control (without the hardware pins) so as not to overflow the buffer.
For more detail: