Measuring the embOS Context Switch Time with Cortex-M and the DWT Cycle Counter

From SEGGER Knowledge Base
Jump to navigation Jump to search

A common way to measure the execution time of code on microcontrollers is to toggle a GPIO and to read the output of the pin using an oscilloscope or logic analyzer like described in the embOS manual. However, when measuring short periods like the context switch time, which can be shorter than a microsecond, the output signal might not be simple to read with an oscilloscope and the hardware itself can also add some inaccuracy, e.g. when the GPIOs are driven with a frequency which is only a fraction of the processor's actual clock frequency.

Some Cortex-M devices have the optional Cycle Counter of the Data Watch and Trace unit (DWT) implemented. When implemented and enabled, this counter increments on each cycle of the processor clock. This can be used to receive a pretty accurate measurement for the embOS context switch time by avoiding any imprecision entailed by the hardware or reading of the signal by using the oscilloscope.

Requirements

The following application for measuring the embOS context switch time using the DWT Cycle Counter requires an Armv7[E]-M or Armv8-M Mainline device with implemented DWT Cycle Counter and a debug probe to read the results from the device's memory, e.g. via the watch view of any IDE's debug session. Furthermore, it is assumed that the Cortex-M SysTick is used as a hardware timer for the embOS system tick. If another hardware timer is used, the code should be modified to disable the hardware timer. Else, it will affect the maximum and average execution time of the context switch.

The Application

Simply let the application run with an active debug session on your device. If the DWT Cycle Counter is not implemented, the debug session will halt at line 74. The application repeats the measuring several times and records the minimal, maximal and average execution time of the context switch. Although the executed code for the context switch is always the same, the minimal and maximal values for the context switch time can differ. The more complex the processor is, the greater the margin. A Cortex-M7 with caches, branch prediction, a long pipeline and probably faster processor clock frequency than the maximum frequency at which memory can be accessed will result in a greater margin between those two values than with a Cortex-M4. Thus, the average execution time is also recorded to see whether the minimal or maximal value is more likely to occur.

After measuring the context switch time, the debug session will halt at line 143. Now, he results can be read from the device's memory by inspecting the variables Min, Max, Average and Nanoseconds.


/*********************************************************************
*                   (c) SEGGER Microcontroller GmbH                  *
*                        The Embedded Experts                        *
*                           www.segger.com                           *
**********************************************************************

-------------------------- END-OF-HEADER -----------------------------
Purpose : embOS sample program that measures the embOS context
          switch time and stores the maximal, minimal, and average
          context switch time (in cycles) in memory. It also saves
          the minimal context switch time (in nanoseconds) in memory.
*/

#include "RTOS.h"

/*********************************************************************
*
*       Defines
*
**********************************************************************
*/

#define NUM_SAMPLES         (1024 * 16)

#define DWT_CTRL            (*(volatile OS_U32*)(0xE0001000u))
#define DWT_CTRL_CYCCNTENA  (1u)
#define DWT_CTRL_NOYCYCCNT  (1u << 25)
#define DWT_CYCCNT          (*(volatile OS_U32*)(0xE0001004u))

#define SYST_CSR            (*(volatile OS_U32*)(0xE000E010u))

#define BREAK()             __asm volatile ("bkpt #0")

/*********************************************************************
*
*       Static data
*
**********************************************************************
*/

static OS_STACKPTR int StackHP[128];
static OS_STACKPTR int StackLP[128];
static OS_TASK         TCBHP;
static OS_TASK         TCBLP;
static OS_U32          Time;

//
// Data to inspect in a watch view of an IDE
//
static volatile OS_U64 Nanoseconds;
static volatile OS_U32 Average = (OS_U32) 0;
static volatile OS_U32 Max     = (OS_U32) 0;
static volatile OS_U32 Min     = (OS_U32)-1;

/*********************************************************************
*
*       Local functions
*
**********************************************************************
*/

/*********************************************************************
*
*       _Initialize()
*/
inline static void _Initialize(void) {
  OS_U32 Ctrl;

  Ctrl = DWT_CTRL;
  //
  // Check if device has the DWT Cycle Counter implemented
  //
  if ((Ctrl & DWT_CTRL_NOYCYCCNT) != 0) {
    BREAK();  // Device has no DWT Cycle Counter implemented
  }
  //
  // Enable the DWT Cycle Counter if it is disabled
  //
  if ((Ctrl & DWT_CTRL_CYCCNTENA) == 0) {
    DWT_CTRL |= DWT_CTRL_CYCCNTENA;
  }
  //
  // Disable the SysTick, as it isn't required and could interfere
  // the measuring of the context switch time
  //
  SYST_CSR = 0;
}

/*********************************************************************
*
*       _GetCycles()
*/
inline static OS_U32 _GetCycles(void) {
  return DWT_CYCCNT;
}

/*********************************************************************
*
*       HPTask()
*/
static void HPTask(void) {
  while (1) {
    OS_TASK_Suspend(NULL);       // Suspend high priority task
    Time = _GetCycles() - Time;  // Stop measurement
  }
}

/*********************************************************************
*
*       LPTask()
*/
static void LPTask(void) {
  OS_U32 MeasureOverhead;
  OS_U32 SampleCount;

  _Initialize();

  SampleCount = 0;
  while (1) {
    //
    // Measure overhead for time measurement so we can take this into account by subtracting it
    // This is done inside the while()-loop to mitigate possible effects of an instruction cache
    //
    MeasureOverhead = _GetCycles();
    MeasureOverhead = _GetCycles() - MeasureOverhead;
    //
    // Perform actual measurements
    //
    Time = _GetCycles();     // Start measurement
    OS_TASK_Resume(&TCBHP);  // Resume high priority task to force task switch
    Time = Time - MeasureOverhead;
    //
    // Evaluate
    //
    if (Time < Min) Min = Time;
    if (Time > Max) Max = Time;
    SampleCount += 1;
    Average     += Time;
    if (SampleCount >= NUM_SAMPLES) {
      Average     = Average / NUM_SAMPLES;
      Nanoseconds = OS_TIME_ConvertCycles2ns(Min);
      while (1) {
        BREAK();  // Break automatically
      }
    }
  }
}

/*********************************************************************
*
*       Global functions
*
**********************************************************************
*/

/*********************************************************************
*
*       main()
*/
int main(void) {
  OS_Init();    // Initialize embOS
  OS_InitHW();  // Initialize required hardware
  OS_TASK_CREATE(&TCBHP, "HP Task", 100, HPTask, StackHP);
  OS_TASK_CREATE(&TCBLP, "LP Task",  50, LPTask, StackLP);
  OS_Start();   // Start embOS
  return 0;
}

/*************************** End of file ****************************/