One of the most important features in an operating system is the support to developers for debugging applicative code. ChibiOS RT and NIL offer several mechanisms that can help in the debug phase of the development cycle.
Problems with debugging Embedded Code
Debugging embedded code is, in general, difficult because the inherent nature of the task:
There are several ways for a system to fail. First lets define some categories for the system anomalies.
A crash is defined as the system going into an exception vector where it is usually stopped.
RT and NIL are static, this means that it is inherently robust, if you are experiencing a crash then the cause is probably external to the system. In a static system the most common causes of a crash are:
NULL
or an invalid address.This condition occurs when the system (application and/or OS) always executes the same code over and over without exiting. For example you could see the system traversing the threads ready list without exiting the loop because a list corruption.
If you verify such a condition within the system code then one of the following causes should be considered.
Halting is a voluntary action of the OS or the application. When an anomalous condition is detected the system calls a special handler that enters the halt state. This is done by calling chSysHalt() in RT or NIL.
Halts are a good thing because:
All ChibiOS debug mechanisms trigger halts in order to signal the developer that a problem has been detected.
This happens when the system does not die but behaves in an unexpected way. This could be either a good or a bad thing. Good because it is possible to use the debugger and try to understand the issue. Bad because the system itself has not found a problem.
During development you are supposed to configure the OS with debug options enabled and compile the code with the least optimizations, -O0 for GCC users.
Now lets see all the available debug options and how those are supposed to help us. All options are located in the file chconf.h usually located in the ./cfg
directory of your project source tree.
This is probably the most important debug option, it makes sure that all the RTOS functions are called from the proper context. If a invalid function call is detected then the system is stopped and the global variable ch.dbg.panic_msg points to an error string. The possible error codes are:
chSysDisable()
has been called from an ISR or from within a critical section.chSysSuspend()
has been called from an ISR or from within a critical section.chSysEnable()
has been called from an ISR or from within a critical section.chSysLock()
has been called from an ISR or from within a critical section. This can happen, for example, if you call chSysLock()
twice.chSysUnlock()
has been called from an ISR or from within a critical section. This can happen, for example, if you call chSysUnlock()
twice or without calling chSysLock()
beforehand.chSysLockFromISR()
has not been called from an ISR or has been called from within a critical section. This can happen, for example, if you call chSysLockFromISR()
twice.chSysUnlockFromISR()
has not been called from an ISR or it has been called outsize a critical section. This can happen, for example, if you call chSysUnlockFromISR()
twice or without calling chSysLockFromISR()
beforehand.CH_IRQ_PROLOGUE()
has not been placed at very beginning of an ISR or it has been placed within a critical section.CH_IRQ_EPILOGUE()
has not been placed at very end of an ISR or it has been placed within a critical section or CH_IRQ_PROLOGUE()
is missing from the ISR.You can see that this option is extremely useful because it allows to catch very common usage errors very early during the development process. Just a note, “SV#” stands for “State Violation”. System states are very strongly checked in ChibiOS RT and NIL.
Also see the article RT State Checker.
This option enables the check of API function parameters, in case of error the system is halted and a message is pointed by ch.dbg.panic_msg
.
This option enable system assertions. Assertions are checks placed in critical places able to detect anomalous conditions. In case of error the system halted and a message is pointed by ch.dbg.panic_msg
.
This option is an “or” of various option flags, each option selects an event to be traced:
CH_DBG_TRACE_MASK_NONE
. No events traced by default (but tracing can be activated at runtime for any event).CH_DBG_TRACE_MASK_SWITCH
. Context switches are traced by default.CH_DBG_TRACE_MASK_ISR
. ISR enter and leave events are traced by default.CH_DBG_TRACE_MASK_HALT
. The halt event is traced, of course it is the last event recorded.CH_DBG_TRACE_MASK_USER
. User events are recorded. Application code can trace events using the chDbgWriteTrace()
API.CH_DBG_TRACE_MASK_SLOW
. All events are enabled except IRQ-related ones.CH_DBG_TRACE_MASK_ALL
. All events are enabled.CH_DBG_TRACE_MASK_DISABLED
. The trace subsystem is removed entirely from the OS image.The trace buffer stores the last N context switch operations. It can be used to determine the sequence of operations that lead to an halt condition. The trace buffer is a memory structure but the ChibiOS/RT Eclipse Debug plugin is able to show the content in a table. Note. this option is only present in RT, not in NIL.
This option enables port-defined stacks checking. Note that the check is usually performed at context switch time and does not necessarily catch all the overflow conditions.
Stacks are filled with a fixed pattern (0x55555555) before running threads. The patter allows to determine how much of a stack area has been really used.
This option disables the tickless mode. This could be useful in case your problem is caused by use of virtual timers exceeding the system ability to serve all the timers.
For the less expert users, there are several things you may do in order to minimize the need for debugging:
./test
” you will find examples for almost any API in the ChibiOS RT and NIL kernels and most common RTOS related tasks../testhal
” there are examples regarding the various device drivers.