0
votes

I am facing an issue in our C based application where one of VxWorks TASK(say Task1) got crashed due to some unknown reasons. The crashed task had locked a mutual exclusion semaphore(say semA). Now the next TASK2 is waiting on semA to get Unlocked. Since semA is locked by a crashed TASK, TASK2 will be waiting infinitely to grab semA. This has broken application functionality.

We can not provide a timeout to lock semA in TASK2 becuase semA is protecting a send routing that is sending data over sockets. Providing a timeout will result in failure in message communication.

After googling I have found ROBUST mutex for LINUX for such problem, but our platform is VxWorks(version 5.5.1). So can somebody tell me the way by which we can handle this problem in VxWorks?

I have tried a below mentioned solution nut not sure how safe it is to do so.

1) TASK2 will wait on semA for a particular timout 2) if failed check the state of previous task that had locked the semA 3 if TASK1 state is SUSPENDED, TASK 2 will call semDelete on semA and than recreate it. 4) if TASK1 is not in SUSPENDED state, keep on waiting to grab semA.

I have test this code as prototype and is working fine. I am not sure about how good is to implement such solution where we recreate semaphore and what will be the possible risks imposed.

Please let me know your inputs.

Thanks

3

3 Answers

1
votes

I think your prototyped solution is not anymore risky than having code (Task1) that crashes for unknown reasons.

If I were to work on your problem, I would first try really hard to find out why Task1 is crashing. If I were unable to figure out the root cause, I would then go to implement your proposed solution. That is, I would query the state of Task2 after a certain amount of time, and then recreate the semaphore.

0
votes

I must say, that even if you implement your work around of recreating the semaphore, then you still have a crashed task which consumes resources. If this problem persists, then eventually the whole system will stop working. In the end the correct and only way to fix this problem is to fix the crash in task1. You should be able to get a stack trace to where it crashed and fix it.

0
votes

I second the previous answers: finding the cause why Task1 crashes is better than implementing a workaround.

Can you post the messages written by VxWorks of the crashed Task1?

One of the first things I try if a task crashes for no good reason is to increase its stack size (let's say double it). If the task runs fine your stack size is too small. Also try to increase the stack size of the task(s) you've modified lately!

If it is a stack problem it isn't neccessarily Task1 which is to blame...