Modern micro controllers are becoming more and more complex, recent Cortex-M cores can be equipped with a data cache because the increasing core frequencies. Unfortunately cache coherency, when multiple bus masters are present, is not handled in HW so software must take case of it.
Examples:
There are several parameters to be considered for cache memories
It is the amount of cache RAM, a bigger cache has better performance. This parameter does not affect SW cache handling.
It is the smallest cache RAM amount that can be mapped over a physical address, it is always a power of two. On Cortex-M devices the cache line size is always 32. This information is important for software handling.
Caches have a number of “ways”, so we can have 2-ways caches, 4 ways caches and so on. More ways means that the cache can be more flexibly associated to physical memory, higher associativity makes for more efficient caches. This parameter does not affect SW cache handling.
There are two kinds of cache memories needing slightly different solutions.
Essentially there are two variants of the problem:
The problem is that, thanks to the cache, the CPU and other bus masters could “see” different data at the same address, accesses must be synchronized in a way to enforce coherency between bus masters.
Lets define two kind of operations on cache:
Note that because the lines-organization of cache memories, invalidating or flushing a memory area can also affect adjacent locations. For example, if the cache line size is 32 (0x20) then invalidating the cache between addresses 0x00001003
and 0x00001047
would cause invalidation of addresses between 0x00001000
and ''0x0000105F'. This can easily cause SW errors because invalidating a buffer would cause involuntary invalidation of adjacent variables causing hard-to-debug software errors.
Because of this buffers accessible by multiple bus masters must always be aligned to cache lines size, both the start address and the buffer size must be aligned.
ChibiOS provides address alignment macro in the compilers abstraction module and cache handling functions in all Cortex-M ports:
This is an example of DMA buffers declaration.
#include "hal.h" #include "ccportab.h" #define BUFFERS_SIZE 36 CC_ALIGN_DATA(CACHE_LINE_SIZE) static uint8_t txbuf[CACHE_SIZE_ALIGN(uint8_t, BUFFERS_SIZE)]; CC_ALIGN_DATA(CACHE_LINE_SIZE) static uint8_t rxbuf[CACHE_SIZE_ALIGN(uint8_t, BUFFERS_SIZE)];
The two declared buffers are guaranteed to be aligned to a cache line address with a size of, at least, 36 bytes. Note that if the device has no cache then the macros do nothing so it is possible to write portable code.
The following code shows how to flush/invalidate buffers.
This is an example of cache invalidation before a DMA engines writes into a buffer.
/* Invalidating the buffer before letting the DMA write in it.*/ cacheBufferInvalidate(rxbuf, BUFFERS_SIZE); /* Receiving data from SPI using DMA.*/ spiReceive(&SPID2, BUFFERS_SIZE, rxbuf);
Note that there is a reason if the invalidation is performed before the DMA operation and not after, apparently it would make sense to invalidate the cache after the DMA operation but consider this scenario:
Invalidating the cache over the buffer before the operation ensures that the above scenario cannot happen.
Before letting a bus master start reading data from a buffer we need to make sure that the data actually reached the RAM before starting the DMA operation. Flushing is only required for WB kind of caches.
/* Flushing cache before letting DMA read from the buffer.*/ cacheBufferFlush(txbuf, BUFFERS_SIZE); /* Sending data to SPI using DMA.*/ spiSend&SPID2, BUFFERS_SIZE, txbuf);