0
votes

Since it's not that easy to perform regex in SQL I'll need some advice in how to solve this problem.

I have a column with the following type of data:

Lorem ipsum dolor sit $%foo##amet%$, consectetur adipiscing elit. Nullam odio risus, mollis a interdum vitae, rutrum id leo. Pellentesque dapibus lobortis mattis. Praesent at nisi a orci commodo scelerisque $%bar##%$ eget id dui. Morbi est arcu, ultricies et consequat ac, pretium sed mi. Quisque iaculis pretium congue. Etiam ullamcorper sapien eu mauris tristique at venenatis mauris ultricies. Proin eu vehicula enim. Vestibulum aliquam, mauris ac tempus vulputate, odio mauris rhoncus purus, id suscipit velit erat quis magna.

The bold text I need to match, and it needs to be replaced with the text found in the second part.

Meaning:

  • $%foo##amet%$ becomes amet
  • $%bar##%$ becomes an empty string.

The pattern as a regex would be something like \$%[^#]+?##([^%]*?)%\$

I can't really use that though since regex is not really supported in tsql...

Any advice?

3
I suggest you doing that in a programming language and not in SQL. SQL is not optimized for such things. It's a DB query language and not a text processing optimized language. - m0skit0
@m0skit0 Problem with that would be performance, its a table thats deleted and filled each night with over 900.000 rows - red-X
In fact the performance hit would be more using SQL IMHO. A test would clear this. - m0skit0
@red-X: if you're worried about performance - then I'd seriously give SQL-CLR a try! Since the .NET code would be executed within SQL Server, you don't have any of the network traffic and thus better performance overall ... - marc_s

3 Answers

1
votes

The best option you have is using a nested REPLACE with fixed matching string:

SELECT REPLACE(
           REPLACE(YourColumn, '$[%]foo##amet[%]$', 'amet'), '$[%]bar##[%]$', '')

Of course, that doesn't have the flexibility of regexes....

Or you could design a SQL-CLR regex library (or find one of the pre-existing ones) and include those into your SQL Server - from 2005 on, SQL Server can execute .NET code, and something like a regex library is a perfect example for extending T-SQL with .NET capabilities.

0
votes

You have to do the following code:

SELECT REPLACE(REPLACE(YourColumn, ' $%amet%$ ', ' amet '), ' $%bar%$ ', ' ')

The % character represents "Any string of zero or more characters" as explained here. You need to put spaces to identify only single words :)

0
votes

fixed it with this function:

CREATE FUNCTION [dbo].[ReplaceWithDefault]
(
   @InputString VARCHAR(4000)
)
RETURNS VARCHAR(4000)
AS
BEGIN
    DECLARE @Pattern VARCHAR(100) SET @Pattern = '$[%]_%##%[%]$'
    -- working copy of the string
    DECLARE @Result VARCHAR(4000) SET @Result = @InputString
    -- current match of the pattern
    DECLARE @CurMatch VARCHAR(500) SET @curMatch = ''
    -- string to replace the current match
    DECLARE @Replace VARCHAR(500) SET @Replace = ''
    -- start + end of the current match
    DECLARE @Start INT
    DECLARE @End INT
    -- length of current match
    DECLARE @CurLen INT
    -- Length of the total string -- 8001 if @InputString is NULL
    DECLARE @Len INT SET @Len = COALESCE(LEN(@InputString), 8001)

    WHILE (PATINDEX('%' + @Pattern + '%', @Result) != 0) 
    BEGIN
        SET @Replace = ''

        SET @Start = PATINDEX('%' + @Pattern + '%', @Result)
        SET @CurMatch = SUBSTRING(@Result, @Start, @Len)

        SET @End = PATINDEX('%[%]$%', @CurMatch) + 2
        SET @CurMatch = SUBSTRING(@CurMatch, 0, @End)

        SET @CurLen = LEN(@CurMatch)

        SET @Replace = REPLACE(RIGHT(@CurMatch, @CurLen - (PATINDEX('%##%', @CurMatch)+1)), '%$', '')

        SET @Result = REPLACE(@Result, @CurMatch, @Replace)
    END
    RETURN(@Result)
END