Beware of the watchog!


Introduction

A watchdog timer (WDT) is a timer that is able to reset a system when it elapses, recovering a hung system: resetting a system is not exactly what would you call recovering but sometimes it’s the only option. The timer must be refreshed (or fed) periodically by software to avoid the reset: if the software is not able to do it, probably the software itself is hanging so resetting the system will unblock the situation.

Most embedded processors and microcontrollers have an integrated WDT while some Super I/O chips implement watchdog for processors that do not integrate one.

Watchdog support in Windows Embedded CE

Windows CE kernel supports by default software watchdog timers: a thread can create a WDT using the CreateWatchDogTimer function. The function has several parameters but the most significant are:

  • The timeout value
  • The action the kernel will take when the timeout expires (kill the process which owns the WDT, reset the system or no action at all)

 

Note that the kernel will use IOCTL_HAL_REBOOT to reset the system so the OAL must implement it for the watchdog to actually trigger a reset.

CreateWatchDogTimer returns a handle that you can pass to other functions:

 

You can take a look at the kernel watchdog code if you installed the shared source code (%_WINCEROOT%\PRIVATE\WINCEOS\COREOS\NK\KERNEL\watchdog.c)

If  the device has a hardware WDT the OEM can support it through the following OAL global variables:

  • pfnRefreshWatchDog: this is a pointer to a function which is called by the kernel to refresh the hardware watchdog. The default value for pfnRefreshWatchDog is NULL, which indicates that there is a watchdog timer does not exist.
  • dwWatchDogPeriod: this variable specifies the watchdog period, in milliseconds, where the hardware watchdog must be refreshed before system reset. The default value for dwWatchDogPeriod is 0, which indicates that there is a watchdog timer does not exist.
  • dwWatchDogThreadPriority: this variable specifies the kernel watchdog thread priority. The default value of dwWatchDogThreadPriority is DEFAULT_WATCHDOG_PRIORITY, which is equal to 100.

 

The kernel watchdog thread will call pfnRefreshWatchDog every dwWatchDogPeriod milliseconds refershing the timer thus avoiding the system reset. If you have a hardware watchdog you can recover the system in two critical situations: a critical thread hangs (the software WDT will trigger); the kernel hangs (the hardware WDT will trigger).

Take in account that having the hardware WDT enabled and breaking into the debugger will probably hurt you…

Below you can find a sample application that create a WDT, refresh it for some time than stops doing it resetting the system. Thanks to my collegue Lorenzo Bertolissi that coded it.

#include <windows.h>
#include <pkfuncs.h>
#define WATCHDOG_NAME L"wd_critproc"
#define WATCHDOG_PERIOD 5000 // milliseconds
#define WATCHDOG_WAIT_TIME 2000 // milliseconds
//WDOG_NO_DFLT_ACTION, WDOG_KILL_PROCESS, WDOG_RESET_DEVICE
#define WATCHDOG_DEFAULT_ACTION WDOG_RESET_DEVICE 
#define MAX_COUNT 10
int _tmain(int argc, TCHAR *argv[], TCHAR *envp[])
{
 HANDLE hWatchDogTimer=NULL;
 LPCWSTR pszWatchDogName=WATCHDOG_NAME;
 DWORD dwPeriod=WATCHDOG_PERIOD;
    DWORD dwWait=WATCHDOG_WAIT_TIME;
 DWORD dwDefaultAction=WATCHDOG_DEFAULT_ACTION;
 DWORD dwCount=0;
 BOOL bRet=FALSE;

    wprintf((TEXT("[critproc] Critical process start\r\n")));
   wprintf((TEXT("[critproc] Calling CreateWatchDogTimer...\r\n")));
 hWatchDogTimer = 
CreateWatchDogTimer(pszWatchDogName, dwPeriod,dwWait, dwDefaultAction,0,0);
 if (! hWatchDogTimer)
 {
  wprintf((TEXT("[critproc] Invalid NULL handle, leaving app\r\n")));
  return 1;
 }

 if (GetLastError()==ERROR_ALREADY_EXISTS)
 {
   wprintf((TEXT("[critproc] WatchDog with this name already exists,
   leaving app\r\n")));
   return 1;
 }
    wprintf((TEXT("[critproc] Valid handle returned [0x%08x]\r\n")),
        hWatchDogTimer);
 wprintf((TEXT("[critproc] Starting watchdog timer...\r\n")));
 bRet = StartWatchDogTimer(hWatchDogTimer,0);
 if (! bRet)
 {
        wprintf((TEXT("[critproc] StartWatchDogTimer failed,
   GetLastError returned 0x%x\r\n")),GetLastError());
  CloseHandle(hWatchDogTimer);
  return 1;
 }
 wprintf((TEXT("[critproc] Watchdog timer started successfully\r\n")));
    dwCount=0;
 while ((dwCount++)<MAX_COUNT)
 {
  BOOL bRetVal=0;
     wprintf((TEXT("[critproc] Refreshing watchdog timer... [%d]\r\n")),dwCount);
     bRetVal = RefreshWatchDogTimer(hWatchDogTimer,0);
  if (!bRetVal)
  {
   wprintf((TEXT("[critproc] Failed to refresh watchdog timer,
    GetLastError returned 0x%x\r\n")),GetLastError());
   CloseHandle(hWatchDogTimer);
   return 1;
  }

  Sleep(1000);
 }
 
    wprintf((TEXT("[critproc] Stopping watchdog timer refresh\r\n")));
    dwCount=0;
 while (++dwCount)
 {
        wprintf((TEXT("[critproc] The watchdog should timeout in  \
   a few seconds... [%d]\r\n")),dwCount);
  Sleep(1000);
 }
 wprintf((TEXT("[critproc] Leaving app (should never be here)\r\n")));
 CloseHandle(hWatchDogTimer);
    return 0;
}
Advertisements
This entry was posted in Windows Embedded CE and tagged , , , , , , . Bookmark the permalink.

5 Responses to Beware of the watchog!

  1. djaus says:

    Useful article.
    Where is the attached code example?

    thx

    • lcalligaris says:

      Since I’ve migrated the blog I lost the download links and I’ve still not figured out how to upload files in this new blog… I’ll send you the file on your e-mail though

  2. Yan says:

    Can send me the sample application? Thanks.

  3. xh says:

    Very good! It’s useful & clearly.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s