Measuring the embOS Context Switch Time with Cortex-M and the DWT Cycle Counter
A common way to measure the execution time of code on microcontrollers is to toggle a GPIO and to read the output of the pin using an oscilloscope or logic analyzer like described in the embOS manual. However, when measuring short periods like the context switch time, which can be shorter than a microsecond, the output signal might not be simple to read with an oscilloscope and the hardware itself can also add some inaccuracy, e.g. when the GPIOs are driven with a frequency which is only a fraction of the processor's actual clock frequency.
Some Cortex-M devices have the optional Cycle Counter of the Data Watch and Trace unit (DWT) implemented. When implemented and enabled, this counter increments on each cycle of the processor clock. This can be used to receive a pretty accurate measurement for the embOS context switch time by avoiding any imprecision entailed by the hardware or reading of the signal by using the oscilloscope.
Requirements
The following application for measuring the embOS context switch time using the DWT Cycle Counter requires an Armv7[E]-M or Armv8-M Mainline device with implemented DWT Cycle Counter and a debug probe to read the results from the device's memory, e.g. via the watch view of any IDE's debug session. Furthermore, it is assumed that the Cortex-M SysTick is used as a hardware timer for the embOS system tick. If another hardware timer is used, the code should be modified to disable the hardware timer. Else, it will affect the maximum and average execution time of the context switch.
The Application
Simply let the application run with an active debug session on your device. If the DWT Cycle Counter is not implemented, the debug session will halt at line 74. The application repeats the measuring several times and records the minimal, maximal and average execution time of the context switch. Although the executed code for the context switch is always the same, the minimal and maximal values for the context switch time can differ. The more complex the processor is, the greater the margin. A Cortex-M7 with caches, branch prediction, a long pipeline and probably faster processor clock frequency than the maximum frequency at which memory can be accessed will result in a greater margin between those two values than with a Cortex-M4. Thus, the average execution time is also recorded to see whether the minimal or maximal value is more likely to occur.
After measuring the context switch time, the debug session will halt at line 143. Now, he results can be read from the device's memory by inspecting the variables Min, Max, Average and Nanoseconds.
/*********************************************************************
* (c) SEGGER Microcontroller GmbH *
* The Embedded Experts *
* www.segger.com *
**********************************************************************
-------------------------- END-OF-HEADER -----------------------------
Purpose : embOS sample program that measures the embOS context
switch time and stores the maximal, minimal, and average
context switch time (in cycles) in memory. It also saves
the minimal context switch time (in nanoseconds) in memory.
*/
#include "RTOS.h"
/*********************************************************************
*
* Defines
*
**********************************************************************
*/
#define NUM_SAMPLES (1024 * 16)
#define DWT_CTRL (*(volatile OS_U32*)(0xE0001000u))
#define DWT_CTRL_CYCCNTENA (1u)
#define DWT_CTRL_NOYCYCCNT (1u << 25)
#define DWT_CYCCNT (*(volatile OS_U32*)(0xE0001004u))
#define SYST_CSR (*(volatile OS_U32*)(0xE000E010u))
#define BREAK() __asm volatile ("bkpt #0")
/*********************************************************************
*
* Static data
*
**********************************************************************
*/
static OS_STACKPTR int StackHP[128];
static OS_STACKPTR int StackLP[128];
static OS_TASK TCBHP;
static OS_TASK TCBLP;
static OS_U32 Time;
//
// Data to inspect in a watch view of an IDE
//
static volatile OS_U64 Nanoseconds;
static volatile OS_U32 Average = (OS_U32) 0;
static volatile OS_U32 Max = (OS_U32) 0;
static volatile OS_U32 Min = (OS_U32)-1;
/*********************************************************************
*
* Local functions
*
**********************************************************************
*/
/*********************************************************************
*
* _Initialize()
*/
inline static void _Initialize(void) {
OS_U32 Ctrl;
Ctrl = DWT_CTRL;
//
// Check if device has the DWT Cycle Counter implemented
//
if ((Ctrl & DWT_CTRL_NOYCYCCNT) != 0) {
BREAK(); // Device has no DWT Cycle Counter implemented
}
//
// Enable the DWT Cycle Counter if it is disabled
//
if ((Ctrl & DWT_CTRL_CYCCNTENA) == 0) {
DWT_CTRL |= DWT_CTRL_CYCCNTENA;
}
//
// Disable the SysTick, as it isn't required and could interfere
// the measuring of the context switch time
//
SYST_CSR = 0;
}
/*********************************************************************
*
* _GetCycles()
*/
inline static OS_U32 _GetCycles(void) {
return DWT_CYCCNT;
}
/*********************************************************************
*
* HPTask()
*/
static void HPTask(void) {
while (1) {
OS_TASK_Suspend(NULL); // Suspend high priority task
Time = _GetCycles() - Time; // Stop measurement
}
}
/*********************************************************************
*
* LPTask()
*/
static void LPTask(void) {
OS_U32 MeasureOverhead;
OS_U32 SampleCount;
_Initialize();
SampleCount = 0;
while (1) {
//
// Measure overhead for time measurement so we can take this into account by subtracting it
// This is done inside the while()-loop to mitigate possible effects of an instruction cache
//
MeasureOverhead = _GetCycles();
MeasureOverhead = _GetCycles() - MeasureOverhead;
//
// Perform actual measurements
//
Time = _GetCycles(); // Start measurement
OS_TASK_Resume(&TCBHP); // Resume high priority task to force task switch
Time = Time - MeasureOverhead;
//
// Evaluate
//
if (Time < Min) Min = Time;
if (Time > Max) Max = Time;
SampleCount += 1;
Average += Time;
if (SampleCount >= NUM_SAMPLES) {
Average = Average / NUM_SAMPLES;
Nanoseconds = OS_TIME_ConvertCycles2ns(Min);
while (1) {
BREAK(); // Break automatically
}
}
}
}
/*********************************************************************
*
* Global functions
*
**********************************************************************
*/
/*********************************************************************
*
* main()
*/
int main(void) {
OS_Init(); // Initialize embOS
OS_InitHW(); // Initialize required hardware
OS_TASK_CREATE(&TCBHP, "HP Task", 100, HPTask, StackHP);
OS_TASK_CREATE(&TCBLP, "LP Task", 50, LPTask, StackLP);
OS_Start(); // Start embOS
return 0;
}
/*************************** End of file ****************************/