ChibiOS Debugging Guide

One of the most important features in an operating system is the support to developers for debugging applicative code. ChibiOS/RT offers several mechanisms that can help in the debug phase of the development cycle.

Problems with debugging Embedded Code

Debugging embedded code is, in general, difficult because the inherent nature of the task:

  • Often the system dies together with the application under debug.
  • Problems are usually intermittent and hard to catch.
  • A problem is not necessarily caused by obvious causes, more likely there is an hidden reason.

Kinds of Malfunctions

There are several ways for a system to fail. First lets define some categories for the system anomalies.

System Crashed

A crash is defined as the system going into an exception vector where it is usually stopped.

ChibiOS is static, this means that it is inherently robust, if you are experiencing a crash then the cause is probably external to the system. In a static system the most common causes of a crash are:

  • Memory Violations. This is the most obvious, if you access invalid addresses or use an invalid memory alignment then the CPU goes directly into an exception.
  • Stack Overflows. The system has a stack for each thread and a stack for exceptions, all stacks must be sized correctly. The overflowing stack can corrupt critical data structures or other stacks, this can make the CPU jump to unmapped addresses and get trapped into an exception.
  • Invalid Calls. Calling an OS function from the wrong context can lead to a crash. Not all OS functions can be invoked from any context, for example wait-like functions cannot be called from an ISR. Each OS has rules to follow. Calling functions from an invalid context is a very common cause of errors, often random and hard to catch errors.
  • Function Pointers. If you have function pointers in the code, for example callbacks, then this can be a way for the system to go out of control. A pointer could contain NULL or an invalid address.
  • Undefined ISR. If an interrupt is triggered for which there isn't an ISR then the system considers it an exception and goes into the unhandled exceptions code (a closed loop).
System Stuck

This condition is when the system always executes the same code over and over without exiting from a closed loop. For example you could see the system traversing the threads ready list without exiting the loop because a list corruption.

If you verify such a condition within the system code then one of the following causes should be considered.

  • Invalid Calls. Calling OS functions in an improper way can lead to corruption of the internal data structures. The effect is a deadlock in the system code but the cause is external.
  • Infinite Interrupts. When writing an ISR it is important to reset the IRQ source or the IRQ will re-trigger the ISR as soon it exits, blocking the system.
System Halted

Halting is a voluntary action of the OS or the application. When an anomalous condition is detected the system calls a special handler that enters the halt state. This is done by calling chSysHalt() in ChibiOS/RT.

Halts are a good thing because:

  • It means that and error has been detected and not gone unnoticed.
  • From the halt handler it is possible to examine the stack trace and the various other system structures and get a good idea about the nature of the problem.

All ChibiOS debug mechanisms trigger halts in order to signal the developer that a problem has been detected.

System Misbehaving

This happens when the system does not die but behaves in an unexpected way. This could be either a good or a bad thing. Good because it is possible to use the debugger and try to understand the issue. Bad because the system itself has not found a problem.

Debug First Actions

During development you are supposed to configure the OS with debug options enabled and compile the code with the least optimizations, -O0 for GCC users.

Debug Configuration Settings

Now lets see all the available debug options and how those are supposed to help us. All options are located in the file chconf.h usually located in the root of your project source tree.

CH_DBG_SYSTEM_STATE_CHECK=TRUE

This is probably the most important debug option, it makes sure that all the RTOS functions are called from the proper context. If a invalid function call is detected then the system is stopped and the global variable ch.dbg.panic_msg points to an error string. The possible error codes are:

  • SV#1. The function chSysDisable() has been called from an ISR or from within a critical zone.
  • SV#2. The function chSysSuspend() has been called from an ISR or from within a critical zone.
  • SV#3. The function chSysEnable() has been called from an ISR or from within a critical zone.
  • SV#4. The function chSysLock() has been called from an ISR or from within a critical zone. This can happen, for example, if you call chSysLock() twice.
  • SV#5. The function chSysUnlock() has been called from an ISR or from within a critical zone. This can happen, for example, if you call chSysUnlock() twice or without calling chSysLock() beforehand.
  • SV#6. The function chSysLockFromISR() has not been called from an ISR or has been called from within a critical zone. This can happen, for example, if you call chSysLockFromISR() twice.
  • SV#7. The function chSysUnlockFromISR() has not been called from an ISR or from within a critical zone. This can happen, for example, if you call chSysUnlockFromISR() twice or without calling chSysLockFromISR() beforehand.
  • SV#8. The macro CH_IRQ_PROLOGUE() has not been placed at very beginning of an ISR or it has been placed within a critical zone.
  • SV#9. The macro CH_IRQ_EPILOGUE() has not been placed at very end of an ISR or it has been placed within a critical zone or CH_IRQ_PROLOGUE() is missing from the ISR.
  • SV#10. An I-class function has been called from outside a critical zone.
  • SV#11. An S-class function has been called from outside a critical zone or has been called from an ISR.

You can see that this option is extremely useful because it allows to catch very common usage errors very early during the development process. Just a note, “SV#” stands for “State Violation”. System states are very strongly checked in ChibiOS/RT.

Also see the article RT State Checker.

CH_DBG_ENABLE_CHECKS=TRUE

This option enables the check of API function parameters, in case of error the system is halted and a message is pointed by ch.dbg.panic_msg.

CH_DBG_ENABLE_ASSERTS=TRUE

This option enable system assertions. Assertions are checks placed in critical places able to detect anomalous conditions. In case of error the system halted and a message is pointed by ch.dbg.panic_msg.

CH_DBG_ENABLE_TRACE=TRUE

The trace buffer stores the last N context switch operations. It can be used to determine the sequence of operations that lead to an halt condition. The trace buffer is a memory structure but the ChibiOS/RT Eclipse Debug plugin is able to show the content in a table.

CH_DBG_ENABLE_STACK_CHECK=TRUE

This option enables port-defined stacks checking. Note that the check is usually performed at context switch time and does not necessarily catch all the overflow conditions.

CH_DBG_FILL_THREADS=TRUE

Stacks are filled with a fixed pattern (0x55555555) before running threads. The patter allows to determine how much of a stack area has been really used.

CH_CFG_OPTIMIZE_SPEED=FALSE

Setting this option to FALSE disables some internal code inlining making the OS code easier to debug. It is not a debug option in itself but can make the debug experience better.

CH_CFG_ST_TIMEDELTA=0

This option disables the tickless mode. This could be useful in case your problem is caused by use of virtual timers exceeding the system ability to serve all the timers.

Eclipse Debug Plugin

ChibiStudio includes a Debug Plugin for Eclipse adding to the Eclipse debugger RTOS awareness. The plugin allows to inspect the Kernel data structures at runtime and is a formidable support. Accessible information include:

  • Active threads list (registry), the threads are shown in a table with all their most important parameters and name.
  • Active virtual timers list.
  • Trace Buffer, the last N context switch events are shown together with a time stamp and other relevant information.
  • Debug variables.
  • Global variables.
  • Statistics.

General Suggestions

For the less expert users, there are several things you may do in order to minimize the need for debugging:

  • Read carefully the documentation first.
  • Enable the various debug options while developing your application.
  • Try to find a code examples for things are you going to do, good sources are:
    • The documentation.
    • The kernel test code, under “./test” you will find examples for almost any API in the ChibiOS/RT kernel and most common RTOS related tasks.
    • The HAL test code, under “./testhal” there are examples regarding the various device drivers.
    • The demo applications.
  • Start your application from an existing demo, add things one at a time and test often, if you add too many things at once then finding a small problem can become a debugging nightmare. Follow the cycle: think, implement, test, repeat.
  • If you are stuck for too much time then consider asking for advice.
  • Report bugs and problems, bugs can be fixed, problems can become new articles in the documentation (this and other documentation articles spawned from questions in the forum or in the tracker).
  • Never give up :-)

More articles and guides are available on the technical wiki.

learn more

Need Tutorials?

Try the video tutorials and guides on Play Embedded.

learn more

Need Support?

The forums is the best place, registration required.

learn more