2
votes

In a single EXE process, are TCL regular expressions shared by each interpreter instance returned by Tcl_CreateInterp? How could threads with 4 different interpreter instances (0x94fbcd8,0x94dff20,0x94c4170,0x94a8760) all be making a call like TclReFree (re=0x86b0444) at ./../generic/regfree.c:52?

This comment in the TCL manual hints that objects may be shared...

Tcl objects are allocated on the heap and are shared as much as possible to reduce storage requirements. Reference counting is used to determine when an object is no longer needed and can safely be freed.

Source: https://www.tcl.tk/man/tcl8.4/TclLib/Object.htm

We're encountering crashes in our 32-bit server application. We've isolated the root cause to a TCL regular expression shared between threads concurrently running in separate TCL interpreter instances.

The interpreters are failing on this line of TCL

regsub "\\*" $s "\\*" s

The application concurrently runs TCL 8.4.11 interpreter instances. Each interpreter is executing "user TCL scripts" in separate threads. The app creates threads that "own" 1 interpreter instance created using Tcl_CreateInterp. Each thread then tells the interpreter instance to run a "user TCL script" with Tcl_EvalObjv. The crash happens when each interpreter is configured to run the same "user TCL script" on the line containing the regsub shown above.

This app has been running in dozens of different production environments for over 15 years. In the current environment, the app is running on Red Hat Linux 6.5 64-bit.

The core dump looks like...

Program terminated with signal 11, Segmentation fault.
#0  0x0811c020 in miss ()
(gdb) bt
#0  0x0811c020 in miss ()
#1  0x0811b7ed in shortest ()
#2  0x0811a4fa in find ()
#3  0x0811a429 in TclReExec ()
#4  0x080fc83f in RegExpExecUniChar ()
#5  0x080fc970 in Tcl_RegExpExecObj ()
#6  0x080bb9f1 in Tcl_RegsubObjCmd ()
#7  0x080b027a in TclEvalObjvInternal ()
#8  0x080d2726 in TclExecuteByteCode ()
#9  0x080d1bd1 in TclCompEvalObj ()
#10 0x080fbd6c in TclObjInterpProc ()
#11 0x080b027a in TclEvalObjvInternal ()
#12 0x080d2726 in TclExecuteByteCode ()
#13 0x080d1bd1 in TclCompEvalObj ()
#14 0x080fbd6c in TclObjInterpProc ()
#15 0x080b027a in TclEvalObjvInternal ()
#16 0x080b0527 in Tcl_EvalObjv ()

After recompiling the app with a version of TCL with the compile flag --enable-symbols=mem and linked with D.U.M.A. - Detect Unintended Memory Access http://duma.sourceforge.net/ (a fork of Electric Fence to help catch buffer overruns), I'm getting a core dump like

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xea8eeb70 (LWP 31004)]
0x08151496 in TclReFree (re=0x86b0444) at ./../generic/regfree.c:52
52      (*((struct fns *)re->re_fns)->free)(re);
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.132.el6_5.2.i686
(gdb) list
47  regfree(re)
48  regex_t *re;
49  {
50      if (re == NULL)
51          return;
52      (*((struct fns *)re->re_fns)->free)(re);
53  }
(gdb) bt
#0  0x08151496 in TclReFree (re=0x86b0444) at ./../generic/regfree.c:52
#1  0x08124360 in FreeRegexp (regexpPtr=0x86b0440) at ./../generic/tclRegexp.c:989
#2  0x08123ec2 in FreeRegexpInternalRep (objPtr=0xf64041b8) at ./../generic/tclRegexp.c:746
#3  0x08128cab in SetStringFromAny (interp=0x0, objPtr=0xf64041b8) at ./../generic/tclStringObj.c:1762
#4  0x08127894 in Tcl_GetUnicodeFromObj (objPtr=0xf64041b8, lengthPtr=0xea8ecee8) at ./../generic/tclStringObj.c:567
#5  0x080c3e9a in Tcl_RegsubObjCmd (dummy=0x0, interp=0x94fbcd8, objc=4, objv=0x94fbf28) at ./../generic/tclCmdMZ.c:718
#6  0x080b1386 in TclEvalObjvInternal (interp=0x94fbcd8, objc=5, objv=0x94fbf24, command=0x0, length=0, flags=0) at ./../generic/tclBasic.c:3088
#7  0x080e5a88 in TclExecuteByteCode (interp=0x94fbcd8, codePtr=0x95174e0) at ./../generic/tclExecute.c:1417
#8  0x080e4959 in TclCompEvalObj (interp=0x94fbcd8, objPtr=0x95097f0) at ./../generic/tclExecute.c:981
#9  0x08122a35 in TclObjInterpProc (clientData=0x9514520, interp=0x94fbcd8, objc=2, objv=0x94fbf1c) at ./../generic/tclProc.c:1100
#10 0x080b1386 in TclEvalObjvInternal (interp=0x94fbcd8, objc=2, objv=0x94fbf1c, command=0x0, length=0, flags=0) at ./../generic/tclBasic.c:3088
#11 0x080e5a88 in TclExecuteByteCode (interp=0x94fbcd8, codePtr=0xf64011f8) at ./../generic/tclExecute.c:1417
#12 0x080e4959 in TclCompEvalObj (interp=0x94fbcd8, objPtr=0x9513f68) at ./../generic/tclExecute.c:981
#13 0x08122a35 in TclObjInterpProc (clientData=0x9514d10, interp=0x94fbcd8, objc=2, objv=0xea8ee34c) at ./../generic/tclProc.c:1100
#14 0x080b1386 in TclEvalObjvInternal (interp=0x94fbcd8, objc=2, objv=0xea8ee34c, command=0x81a4ffe "", length=0, flags=0) at ./../generic/tclBasic.c:3088
#15 0x080b15e4 in Tcl_EvalObjv (interp=0x94fbcd8, objc=2, objv=0xea8ee34c, flags=0) at ./../generic/tclBasic.c:3204
#16 0x0808a812 in run_tcl_proc (pDevice=0x82405e0, pInterp=0x830d340, iNumArgs=2, objv=0xea8ee34c, bIsCommand=0 '\000', pCommand=0x0)
#17 0x08093492 in Tcl_begin_next_state (pDevice=0x82405e0, iNextState=RunPoll, pCommand=0x0)
#18 0x08093579 in Tcl_port_thread (dummy=0x8232c00)
#19 0x0014fb39 in start_thread () from /lib/libpthread.so.0
#20 0x00967d7e in clone () from /lib/libc.so.6
(gdb) 

This gdb sessions also clearly shows concurrent threads executing regfree on the same regular expression, even though each thread's TCL interpreter instance is completely thread bound. There should be zero sharing between threads. The only thing they have in common is they are executing a "user TCL script" file with the same filename. The files were all loaded with Tcl_EvalFile into per-thread interpreter instances.

(gdb) info threads
  45 Thread 0xe30e2b70 (LWP 31017)  0x00110430 in __kernel_vsyscall ()
--snip--
  34 Thread 0xe9eedb70 (LWP 31005)  0x00110430 in __kernel_vsyscall ()
* 33 Thread 0xea8eeb70 (LWP 31004)  0x08151496 in TclReFree (re=0x86b0444) at ./../generic/regfree.c:52
  32 Thread 0xeb2efb70 (LWP 31003)  0x08151496 in TclReFree (re=0x86b0444) at ./../generic/regfree.c:52
  31 Thread 0xebcf0b70 (LWP 31002)  0x08151496 in TclReFree (re=0x86b0444) at ./../generic/regfree.c:52
  30 Thread 0xec6f1b70 (LWP 31001)  0x08151496 in TclReFree (re=0x86b0444) at ./../generic/regfree.c:52
  29 Thread 0xed0f2b70 (LWP 31000)  0x00110430 in __kernel_vsyscall ()
--snip--
  1 Thread 0xf7fec8d0 (LWP 30970)  0x00110430 in __kernel_vsyscall ()
(gdb) 

Note that this question is a completely separate crash from my previous question alloc: invalid block - Are Tcl_IncrRefCount and Tcl_DecrRefCount thread safe for threaded Tcl / 1 interp per thread?.

After digging through the app's code, I found a case where in thread A an interpreter is created and asked to run a proc but then in thread B used to run many other procs. I'm guessing this may be the root cause of this crash. Strangely, the app doesn't crash on Windows but crashes immediately (most of the time) on Linux. The app creates threads:

  • On Windows, using the Win32 API.
  • On Linux, using POSIX Threads / pthreads.
1
Either the Tcl implementation you are using is seriously unsafe -- e.g. it relies on statically-allocated global memory -- or your interpreters are in some other way not as independent as you suppose. Different interpreters could, in principle, share resources. I see that you are not using the distro's Tcl (which is version 8.5.7); is it possible to test your application against that Tcl? - John Bollinger
The TCL implementation used is sourceforge.net/projects/tcl/files/Tcl/8.4.11 - buzz3791
If the interpreters were really independent, then it is incredibly unlikely that all four would be at exactly the same point in the script at exactly the same time. Is that a side effect of your debugger use? Otherwise, it's almost surely another symptom of the problem. - John Bollinger
Is it possible that you built and are using a non-thread-aware version of Tcl in this environment? I understand that is an option, but it's probably not one you want to use for your particular application. Otherwise, your problem seems to contradict this answer. - John Bollinger
@John It's a thread-aware version of TCL. The configure flags "--enable-threads --enable-symbols=mem --disable-shared" where used to build the libtcl8.4g.a linked into the app. This was a debug build of the app used to create the gdb session showing 4 threads trying to free the same regular expression. - buzz3791

1 Answers

1
votes

To answer your immediate question, REs are shared by two mechanisms. Firstly, they're bound to the internal representation of the Tcl_Obj values generated from the values in your script (e.g., the literals and the results of operations). Secondly, they're also stored in a size-bounded per-thread LRU cache.

Both of these mechanisms are strictly thread-bound. REs are not shared between threads; Tcl shares extremely little between threads.

However, there are a number of larger issues in your question.

If you're sending messages (err, scripts) between threads for execution, you're strongly recommended to use the Thread extension for this, as this takes care to copy things that need to be copied. The Thread extension ships with a full distribution of Tcl 8.6 (it's now a contributed package, along with [incr Tcl], SQLite and TDBC) but it should be available separately for older versions of Tcl.

Also, you're using a doubly-unsupported version of Tcl. The most recent version of 8.4 is 8.4.20 (which should be a drop-in replacement) and even that has been out of security/build support for several years now. You really are recommended to upgrade. 8.5.17 is the current long-term support release, and 8.6.3 is the current production release. (They're also quite a bit faster on a lot of code.)