17
votes

I have a processor AT91SAM9G20 running a 2.6 kernel. Watchdog is enabled at bootstrap level and configured for 16 seconds. Watchdog mode register can be configured only once. When code hangs either in bootstrap, bootloader or kernel, the board reboots. But once kernel comes up even though watchdog is not refreshed in any of the applications, the board is not being reset after 16 seconds, but 15 minutes.

Who is refreshing the watchdog?

In our case, the watchdog should be influenced by applications, so that the board can reset if our application hangs.

These are the running processes:

1 root     init
2 root     [kthreadd]
3 root     [ksoftirqd/0]
4 root     [watchdog/0]
5 root     [events/0]
6 root     [khelper]
63 root     [kblockd/0]
72 root     [ksuspend_usbd]
78 root     [khubd]
85 root     [kmmcd]
107 root     [pdflush]
108 root     [pdflush]
109 root     [kswapd0]
110 root     [aio/0]
740 root     [mtdblockd]
828 root     [rpciod/0]
982 root     [jffs2_gcd_mtd10]
1003 root     /sbin/udevd -d
1145 daemon   portmap
1158 dbus     dbus-daemon --system
1178 root     /usr/sbin/ifplugd -i eth0 -fwI -u0 -d5 -l -q
1190 root     /usr/sbin/ifplugd -i eth1 -fwI -u0 -d5 -l -q
1221 default  avahi-daemon: running [SP14.local]
1226 root     /usr/sbin/dropbear
1246 root     /root/bin/host_app
1254 root     /root/bin/mini_httpd -c *.cgi -d /root/bin -u root -E /root/bin/
1256 root     -sh
1257 root     /sbin/syslogd -n -m 0
1258 root     /sbin/klogd -n
1259 root     /usr/bin/tail -f /var/log/messages
1265 root     ps -e

We are using the watchdog for soft lockups available in kernel-2.6.25-ts.at91sam9g20/kernel/softlockup.c

5

5 Answers

18
votes

If you enabled the watchdog driver in your kernel, the watchdog driver sets up a kernel timer, in charge of resetting the watchdog. The corresponding code is here. So it works like this:

If no application opens the /dev/watchdog file, then the kernel takes care of resetting the watchdog. Since it is a timer, it won't appear as a dedicated kernel thread, but handled by the soft IRQ thread. Now, if an application opens this file, it becomes responsible of the watchdog, and can reset it by writing to the file, as documented by the documentation linked in Richard's post.

Is the watchdog driver configured in your kernel? If not, you should configure it, and see if the reset still happens. If it still happens, it is likely that your reset comes from somewhere else.

If your kernel is too old to have a proper watchdog driver (not present in 2.6.25) you should backport it from 2.6.28. Or you can try to disable the watchdog in your bootloader and see if the reset still occurs.

7
votes

In July 2016 a commit in the 4.7 kernel to watchdog_dev.c enabled the same behavior as shodanex's answer for all watchdog timer drivers. This doesn't seem to be documented anywhere other than this thread and the source code.

/*
* A worker to generate heartbeat requests is needed if all of the
* following conditions are true.
* - Userspace activated the watchdog.
* - The driver provided a value for the maximum hardware timeout, and
*   thus is aware that the framework supports generating heartbeat
*   requests.
* - Userspace requests a longer timeout than the hardware can handle.
*
* Alternatively, if userspace has not opened the watchdog
* device, we take care of feeding the watchdog if it is
* running.
*/

return (hm && watchdog_active(wdd) && t > hm) ||
       (t && !watchdog_active(wdd) && watchdog_hw_running(wdd));
6
votes

This may give you a hint: http://www.mjmwired.net/kernel/Documentation/watchdog/watchdog-api.txt

It makes perfect sense to have a user space daemon handling the watchdog. It probably defaults to a 15 minute timeout.

2
votes

we had a similar problem regarding WDT on AT91SAM9263. Problem was with bit 29 WDIDLEHLT of WDT_MR (Address: 0xFFFFFD44) register. This bit was set to 1 but it should be 0 for our application needs.

Bit explanation from datasheet documentation:

• WDIDLEHLT: Watchdog Idle Halt

  1. 0: The Watchdog runs when the system is in idle mode.
  2. 1: The Watchdog stops when the system is in idle state.

This means that WDT counter does not increment when kernel is in idle state, hence the 15 or more delay until reset happens.

You can try "dd if=/dev/zero of=/dev/null" which will prevent kernel from entering idle state and you should get a reset in 16 seconds (or whatever period you have set in WDT_MR register).

So, the solution is to update u-boot code or other piece of code that sets WDT_MR register. Remember this register is write once...

0
votes

Wouldn't the kernel be refreshing the watchdog timer? The watchdog is designed to reset the board if the whole system hangs, not just a single application.