1
votes

Friends: in PostgreSQL plpython, am trying to do an iterative search/replace in a text block 'data'.

Using re-sub to define a match pattern, then call a function 'replace' to do the work. Objective is to have the 'replace' function called repeatedly, as some replacements generate further 'rule' matches, which require further replacements.

All works well through many, many replacements - and I'm managing to trigger the 2nd Pass of the repeat loop. Then, until something causes the Regex pattern to return an integer(?) -- apparently at the point it finds no matches... ?? I've tried testing for 'None' and '0', with no luck. Ideas?

data = (a_huge_block of_text)

# ======================  THE FUNCTION  ==============
def replace(matchobj):
 tag = matchobj.group(1)
 plpy.info("-------- matchobj.group(1), tag: ", tag)
 if matchobj.group(1) != '':
  (do all the replacement work in here)
# ======================  END FUNCTION  ==============

passnumber = 0
# If _any_ pattern match is found, process all of data for _all_ matches:
while re.search('(rule:[A-Za-z#]+)', data) != '':
 # BEGIN repeat loop:
 passnumber = passnumber + 1
 plpy.info(' ================================  BEGIN PASS: ',  passnumber)

 data = re.sub('(rule:[A-Za-z#]+)', replace, data)
 plpy.info(' =================================== END PASS: ',  passnumber)

Above code seems to be running OK, into a second iteration... then:

ERROR:  TypeError: sequence item 21: expected string, int found
CONTEXT:  Traceback (most recent call last):
  PL/Python function "myfunction", line 201, in <module>
    data = re.sub('(rule:[A-Za-z#]+)', replace, data)
  PL/Python function "myfunction", line 150, in sub
PL/Python function "myfunction"

Have also tried re.search (...) != '' -- and re.search (...) != 'None' --- with same result. I do realize I must find the syntax to represent the match object in some readable form...

1
There's a trick to figuring this kind of thing out. It's called the print function. In the body of the while loop, add enough print functions to display repr(replace) and repr(data). Don't guess about what's going on. print stuff. Include the output in your question so we can all see what's actually happening. Proof is better than speculation. - S.Lott
Better yet, use pdb and inspect the stack at the point of the error to see what it's really stuck on. - Ross Patterson
@S.Lott: Don't think I have any print output capability from plpython; I do have the plpy.info call, which I make extensive use of. - DrLou
@Ross Patterson: Am I able to use pdb from within a plpython call? (I thought we weren't). Will research. - DrLou
Well I don't know plpython, so maybe not. But if you can invoke the python process yourself, you can use python -m pdb /file/to/run.py, or if the Python process can control std(in|out|err) then you can use pdb.set_trace(). - Ross Patterson

1 Answers

0
votes

The answer to this turned out to be quite simple, of course, once you know Python! (I don't!)

To initiate the repeat loop, I had been doing this test:

while re.search('(rule:[A-Za-z#]+)', data) != '':

Had also tried this one, which will also not work:

while re.search('(rule:[A-Za-z#]+)', data) != 'None':

The None result can be trapped, of course, but the quotes are not needed. It's as simple as that:

while re.search('(rule:[A-Za-z#]+)', data) != None:

It's all so simple, once you know!