We are facing C++ application crash issue due to segmentation fault on RED hat Linux. We are using embedded python in C++.
Please find below my limitation
- Don’t I have access to production machine where application crashes. Client send us core dump files when application crashes.
- Problem is not reproducible on our test machine which has exactly same configuration as production machine.
- Sometime application crashes after 1 hour, 4 hour ….1 day or 1 week. We haven’t get time frame or any specific pattern in which application crashes.
- Application is complex and embedded python code is used from lot of places from within application. We have done extensive code reviews but couldn’t find the fix by doing code review.
- As per stack trace in core dump, it is crashing around multiplication operation, reviewed code for such operation in code we haven’t get any code where such operation is performed. Might be such operations are called through python scripts executed from embedded python on which we don’t have control or we can’t review it.
- We can’t use any profiling tool on production environment like Valgrind.
- We are using gdb on our local machine to analyze core dump. We can’t run gdb on production machine.
Please find below the efforts we have putted in.
- We have analyzed logs and continuously fired request that coming towards our application on our test environment to reproduce the problem.
- We are not getting crash point in logs. Every time we get different logs. I think this is due to; Memory is smashed somewhere else and application crashes after sometime.
- We have checked load at any point on our application and it is never exceeded our application limit.
- Memory utilization of our application is also normal.
- We have profiled our application with help of Valgrind in our test machine and removed valgrind errors but application is still crashing.
I appreciate any help to guide us to proceed further to solve the problem.
Below is the version details
Red hat linux server 5.6 (Tikanga) Python 2.6.2 GCC 4.1
Following is the stack trace I am getting from the core dump files they have shared (on my machine). FYI, We don’t have access to production machine to run gdb on core dump files.
0 0x00000033c6678630 in ?? ()
1 0x00002b59d0e9501e in PyString_FromFormatV (format=0x2b59d0f2ab00 "can't multiply sequence by non-int of type '%.200s'", vargs=0x46421f20) at Objects/stringobject.c:291
2 0x00002b59d0ef1620 in PyErr_Format (exception=0x2b59d1170bc0, format=<value optimized out>) at Python/errors.c:548
3 0x00002b59d0e4bf1c in PyNumber_Multiply (v=0x2aaaac080600, w=0x2b59d116a550) at Objects/abstract.c:1192
4 0x00002b59d0ede326 in PyEval_EvalFrameEx (f=0x732b670, throwflag=<value optimized out>) at Python/ceval.c:1119
5 0x00002b59d0ee2493 in call_function (f=0x7269330, throwflag=<value optimized out>) at Python/ceval.c:3794
6 PyEval_EvalFrameEx (f=0x7269330, throwflag=<value optimized out>) at Python/ceval.c:2389
7 0x00002b59d0ee2493 in call_function (f=0x70983f0, throwflag=<value optimized out>) at Python/ceval.c:3794
8 PyEval_EvalFrameEx (f=0x70983f0, throwflag=<value optimized out>) at Python/ceval.c:2389
9 0x00002b59d0ee2493 in call_function (f=0x6f1b500, throwflag=<value optimized out>) at Python/ceval.c:3794
10 PyEval_EvalFrameEx (f=0x6f1b500, throwflag=<value optimized out>) at Python/ceval.c:2389
11 0x00002b59d0ee2493 in call_function (f=0x2aaab09d52e0, throwflag=<value optimized out>) at Python/ceval.c:3794
12 PyEval_EvalFrameEx (f=0x2aaab09d52e0, throwflag=<value optimized out>) at Python/ceval.c:2389
13 0x00002b59d0ee2d9f in ?? () at Python/ceval.c:2968 from /usr/local/lib/libpython2.6.so.1.0
14 0x0000000000000007 in ?? ()
15 0x00002b59d0e83042 in lookdict_string (mp=<value optimized out>, key=0x46424dc0, hash=40722104) at Objects/dictobject.c:412
16 0x00002aaab09d5458 in ?? ()
17 0x00002aaab09d5458 in ?? ()
18 0x00002aaab02a91f0 in ?? ()
19 0x00002aaab0b2c3a0 in ?? ()
20 0x0000000000000004 in ?? ()
21 0x00000000026d5eb8 in ?? ()
22 0x00002aaab0b2c3a0 in ?? ()
23 0x00002aaab071e080 in ?? ()
24 0x0000000046422bf0 in ?? ()
25 0x0000000046424dc0 in ?? ()
26 0x00000000026d5eb8 in ?? ()
27 0x00002aaab0987710 in ?? ()
28 0x00002b59d0ee2de2 in PyEval_EvalFrame (f=0x0) at Python/ceval.c:538
29 0x0000000000000000 in ?? ()
(1, 2, 3) * 2.5
. Isn't that in itself a bug, which should be easy to trace down as that's an unusual kind of expression? – Potatoswatterg++
are you using? – Alex Chamberlainstdc++
library; GCC has progressed a lot since. And you could also compile with a recent Clang (from LLVM 3.2) in addition of using a recent GCC (4.7), to get more warnings etc.... – Basile Starynkevitch