1
votes

What I'm trying to do is update the URLs of HTML background tags in a document. The caveat being only local URLs should be updated, so anything starting http should be ignored.

The RegEx I'm trying to achieve needs to replace the path before the filename & extension. so for example:

background="image.gif"
background="/image.gif"
background="images/image.gif"
background="images/directory/image.gif"

should all output as:

background="/mydirectory/image.gif"

As always, both single or double quotes may have been used in the input file.

I already have an existing RegEx that is doing a very similar job for the CSS image references. The RegEx is:

url\((?:\'|\"")?(?!(?:http|ftp))(?<path>.+)\/(?<filename>.*?)\1?\)

I thought I would simply be able to replace the url() match with background= but so far I've not been successful.

Any help greatly apprechiated.

1

1 Answers

1
votes

See demo here Regex background=['"](?!\s*(http|ftp):\/\/)(?:[^'"]*\/)*(?<filename>[^'"]+)['"]

In details:

  • baground=match 'background=' literraly
  • ['"] match any quote
  • (?!\s*(http|ftp):\/\/) ensure after the quote there's not http:// or ftp:// (even with spaces before)
  • (?:[^'"]*\/)* match the path until the filename when still inside the quotes [^'"]*\/ included in a non capturing group which (?:)which can be repeated 0 or more time by * (could be replaced by ? as it will match all or nothing in fact)
  • (?<filename>[^'"]+) capture anything not a quote (and non empty by +quantifier) in the capture group filename
  • ['"] match the last quote to avoid capturing it

Old Answer (in case it interest some): See demo here

the regex is: background=['"](?!\s*(http|ftp):\/\/)\/?(?<filename>[^'"]*)['"]

In details:

  • baground=match 'background=' literraly
  • ['"] match any quote
  • (?!\s*(http|ftp):\/\/) ensure after the quote there's not http:// or ftp:// (even with spaces before)
  • \/? match the leading / to avoid capture it if present
  • (?<filename>[^'"]*) capture anything not a quote in the capture group filename
  • ['"] match the last quote to avoid capturing it

See the demo for replacing, but the idea is to replace by /mydir/${filename}