1
votes

Short Question

Is it possible to use regular expressions to match a string which doesn't appear within some other longer string? (I'm using MATLAB)

Example

shortString = {'hello', 'there'};
longString = {'why hello there', 'hello, my friend', 'hello and goodbye'};
wholeString = ['"why hello there", said one guy. The other guy over ' ...
'there said hello, my friend. hello and goodbye, said the ' ...
'third guy. So let''s all say hello!'];

In this example, I would like to match the elements of shortString in wholeString, as long as they do not appear in wholeString as a substring of an element of longString.

In the above example, I would only like to match the 'hello' from 'So let''s all say hello!', and the 'there' from 'The other guy over there'.

Specific Problem

I am writing a function that will markup the help comments at the top of a function to get it ready to be published using MATLAB's publish function (See Publishing Markup). In addition to marking up the syntax of the function, I want to markup any inputs/outputs of the function, which appear outside of the syntax. Assume that I've extracted the help comments from the function in some way. Here are some example comments.

% SYNTAX
%    x = somefunction(y)
%    [out1, out2] = somefunction(in1, in2)
%
% DESCRIPTION
%    x = somefunction(y) does something to y and returns x. I also dont wan't
%       your answer to match the 'y' in 'your' or ''y''.
%
%    [out1, out2] = somefunction(in1, in2) does something else with in1 and 
%       in2. Then it returns out1 and out2.

After marking up these comments with my current method, I have the following

%% Syntax
%        x = somefunction(y)
%        [out1, out2] = somefunction(in1, in2)
%
%% Description
% |x = somefunction(y)| does something to y and returns x. I also dont wan't
% your answer to match the 'y' in your or ''y''.
%
% |[out1, out2] = somefunction(in1, in2)| does something else with in1 and 
% in2. Then it returns out1 and out2.

I would like to make each input and output argument appear mono-spaced in the Description section. I'd like the final text to be

%% Syntax
%        x = somefunction(y)
%        [out1, out2] = somefunction(in1, in2)
%
%% Description
% |x = somefunction(y)| does something to |y| and returns |x|. I also dont wan't
% your answer to match the 'y' in 'your', or ''y''.
%
% |[out1, out2] = somefunction(in1, in2)| does something else with |in1| and 
% |in2|. Then it returns |out1| and |out2|.

The problem is that I can't just match the input arguments, or I would also match them in the syntax part that's already mono-spaced.

At my disposal is a cell-array containing each syntax definition, a cell-array containing each input argument, and a cell array containing each output argument.

syntax = {'x = somefunction(y)', '[out1, out2] = somefunction(in1, in2)'};
inputs = {'y', 'in1', 'in2'};
outputs = {'x', 'out1', 'out2'};
1
To answer the problem of syntax matching, you can capture only the help text with txt = help('yourfunction'). Does this solve your issue completely?Oleg
@OlegKomarov Thanks, but no. The problem is not getting the help text, but marking it up. I want to mark it up so that I can publish it to html using the publish function.hoogamaphone

1 Answers

1
votes

It's a bit ad hoc, and I think it might be improved but suppose your input is:

txt = ['% SYNTAX' char(13)...
'%    x = somefunction(y)' char(13)...
'%    [out1, out2] = somefunction(in1, in2)' char(13)...
'%' char(13)...
'% DESCRIPTION' char(13)...
'%    x = somefunction(y) does something to y and returns x. I also dont wan''t' char(13)...
'%       your answer to match the ''y'' in ''your'' or ''''y''''.' char(13)...
'%' char(13)...
'%    [out1, out2] = somefunction(in1, in2) does something else with in1 and ' char(13)...
'%       in2. Then it returns out1 and out2.'];

inout  = {'y', 'in1', 'in2', 'x', 'out1', 'out2'}';

Using regular expressions with look-ahead and look-behind operators, i.e. your inputs and outputs should be enclosed within \s,\.\!\?;:% AND be preceded AND followed by alphanumeric characters, OR be at the beginning (useless?) or end of the text:

expr = strcat('(?<=\w[\s,\.\!\?;:%]+|^)', inout,'(?=[\s,\.\!\?;:]+\w|\.?$)');
regexprep(txt,expr,'|$&|')

The result is:

ans =
% SYNTAX
%    x = somefunction(y)
%    [out1, out2] = somefunction(in1, in2)
%
% DESCRIPTION
%    x = somefunction(y) does something to |y| and returns |x|. I also dont wan't
%       your answer to match the 'y' in 'your' or ''y''.
%
%    [out1, out2] = somefunction(in1, in2) does something else with |in1| and 
%       |in2|. Then it returns |out1| and |out2|.

Alternatively

In a few steps, you can match start and end positions of your syntax, then retrieve positions of inputs/outputs and exclude those from the replace operation which are within start/pos.