1
votes

Novice racket coder here.

I have an xexpr that I'd like to filter newlines when they are next to certain tags. The idea is that I'd like to avoid new lines before and after certain elements, like the 'eq tag or 'figure tag.

Suppose I have the following:

(define tx1
  '(div (div "this is" "\n\n" (eq "y=x") "\n\n" "my equation" (eq "z=y"))
        "to fire" (eq "h=u") "up" "\n\n\n" (eq "tm=re") "\n" "this test" "\n"))

I'd like to remove the newline strings ("\n", "\n\n", "\n\n\n") around the 'eq tags to produce the following:

(define result
  '(div (div "this is" (eq "y=x") "my equation" (eq "z=y"))
        "to fire" (eq "h=u") "up" (eq "tm=re") "this test" "\n"))

My first step, is to recognize the newlines. I've found that pattern match is a possible solution.

(match tx1
  [(list a ... (regexp #rx"^\n+") b ...) `(,@a ,@b)]
  [(list a ...) `(,@a)]) 

However, I have to run this match function once for every occurrence of a newline. In this case, I have to run it 3 times. I could test the result every time to see if it has changed, but this seems sub-optimal.

The second step, which I have not reached yet, is to match previous and subsequent items (a and b) to lists with an 'eq tag. The third step is to find sublists and recurse over those.

My question is, first is there a better approach to fixing this sort of problem? I believe there likely is such a solution. My second question is, if matching is best, is there a way to conduct a multiple replace? I've played with cons matching, and that works to some extent, but it discards the first item.

Thank you.

1
If you read this from .xml file you can use eliminate-whitespace on element object returned: (define service:xml (document-element (read-xml (open-input-file "test.xml")))) (define service:xml:no-ws ((eliminate-whitespace '(div eq)) service:xml)) (define service:xe (xml->xexpr service:xml:no-ws)) - Ondrej
This looks like a problem that would be simpler parsing rather than just relying on regular expressions. - ben rudgers

1 Answers

0
votes

I believe I've found an alternative solution. Pattern matching is an interesting idea--it's like regexes for lists and expressions. However, from a novice's point-of-view, it could use some developments that align it more closely with conventional regexes, like search and replace functions and such.

The solution I ended up with converts the input xexpr into a vector and processes each element, based on the preceding and proceeding elements.

(define (clean-newlines elems [tags '(eq figure)])
  (define elements (merge-newlines elems))
  (define elems-vec (list->vector elements))
  (for/list ([(elem idx) (in-indexed elems-vec)])
    (cond
       ; skip first and last elements
      [(or (= idx 0) (= idx (sub1 (vector-length elems-vec)))) elem]
       ; process recursively if elem is a txexpr
      [(txexpr-elements? elem) (clean-newlines elem)]
       ; see if the element is a new line
      [(and (string? elem) (regexp-match #rx"^\n+" elem))
        ; get the previous and next elements
       (let ([prev (vector-ref elems-vec (sub1 idx))]
             [next (vector-ref elems-vec (add1 idx))])
          ; if the previous or next element matches the tag, strip it
         (cond
           [(and (txexpr-elements? next) (member (get-tag next) tags)) "\n"]
           [(and (txexpr-elements? prev) (member (get-tag prev) tags)) "\n"]
           [else elem]))]
      [else elem])))