Table of Contents

Cortex-M cache coherency using ChibiOS/HAL

Modern micro controllers are becoming more and more complex, recent Cortex-M cores can be equipped with a data cache because the increasing core frequencies. Unfortunately cache coherency, when multiple bus masters are present, is not handled in HW so software must take case of it.

Examples:

Cache Organization

There are several parameters to be considered for cache memories

Cache Total Size

It is the amount of cache RAM, a bigger cache has better performance. This parameter does not affect SW cache handling.

Cache Line Size

It is the smallest cache RAM amount that can be mapped over a physical address, it is always a power of two. On Cortex-M devices the cache line size is always 32. This information is important for software handling.

Cache Associativity

Caches have a number of “ways”, so we can have 2-ways caches, 4 ways caches and so on. More ways means that the cache can be more flexibly associated to physical memory, higher associativity makes for more efficient caches. This parameter does not affect SW cache handling.

Cache Type

There are two kinds of cache memories needing slightly different solutions.

The issue

Essentially there are two variants of the problem:

  1. Bus masters reading from a cached RAM.
  2. Bus masters writing to a cached RAM.

The problem is that, thanks to the cache, the CPU and other bus masters could “see” different data at the same address, accesses must be synchronized in a way to enforce coherency between bus masters.

Possible solutions

  1. Place DMA-accessible buffers in a non-cached RAM. On the STM32F7, for example, the TCM memory is DMA-accessible and not cached. It can be conveniently used for DMA buffers without constraints.
  2. Make some of the RAM non-cached by using the MPU. The Cortex-M MPU allows to enforce memory attributes for defined regions, including cache handling.
  3. SW handling of cache coherency, this is what we will discuss in this article.

Coherency Operations

Lets define two kind of operations on cache:

Alignment Issues

Note that because the lines-organization of cache memories, invalidating or flushing a memory area can also affect adjacent locations. For example, if the cache line size is 32 (0x20) then invalidating the cache between addresses 0x00001003 and 0x00001047 would cause invalidation of addresses between 0x00001000 and ''0x0000105F'. This can easily cause SW errors because invalidating a buffer would cause involuntary invalidation of adjacent variables causing hard-to-debug software errors.

Because of this buffers accessible by multiple bus masters must always be aligned to cache lines size, both the start address and the buffer size must be aligned.

ChibiOS provides address alignment macro in the compilers abstraction module and cache handling functions in all Cortex-M ports:

Declaing DMA Buffers

This is an example of DMA buffers declaration.

#include "hal.h"
#include "ccportab.h"
 
#define BUFFERS_SIZE 36
 
CC_ALIGN_DATA(CACHE_LINE_SIZE) static uint8_t txbuf[CACHE_SIZE_ALIGN(uint8_t, BUFFERS_SIZE)];
CC_ALIGN_DATA(CACHE_LINE_SIZE) static uint8_t rxbuf[CACHE_SIZE_ALIGN(uint8_t, BUFFERS_SIZE)];

The two declared buffers are guaranteed to be aligned to a cache line address with a size of, at least, 36 bytes. Note that if the device has no cache then the macros do nothing so it is possible to write portable code.

Operations on Buffers

The following code shows how to flush/invalidate buffers.

Buffer Invalidation

This is an example of cache invalidation before a DMA engines writes into a buffer.

  /* Invalidating the buffer before letting the DMA write in it.*/
  cacheBufferInvalidate(rxbuf, BUFFERS_SIZE);
 
  /* Receiving data from SPI using DMA.*/
  spiReceive(&SPID2, BUFFERS_SIZE, rxbuf);

Note that there is a reason if the invalidation is performed before the DMA operation and not after, apparently it would make sense to invalidate the cache after the DMA operation but consider this scenario:

Invalidating the cache over the buffer before the operation ensures that the above scenario cannot happen.

Buffers Flushing

Before letting a bus master start reading data from a buffer we need to make sure that the data actually reached the RAM before starting the DMA operation. Flushing is only required for WB kind of caches.

  /* Flushing cache before letting DMA read from the buffer.*/
  cacheBufferFlush(txbuf, BUFFERS_SIZE);
 
  /* Sending data to SPI using DMA.*/
  spiSend&SPID2, BUFFERS_SIZE, txbuf);