1
votes

i want to write a major mode for emacs which should do syntax highlighting for mml (music macro language) keywords. I followed this tutorial: http://ergoemacs.org/emacs/elisp_syntax_coloring.html

here is my current code (under x-events there are still placeholders, and x-functions I haven't adjusted yet and took over from the tutorial):

;; 
;; to install this mode, put the following lines
;;     (add-to-list 'load-path "~/.emacs.d/lisp/")
;;     (load "mml-mode.el")
;; into your init.el file and activate it with
;; ALT+X mml-mode RET
;; 

;; create the list for font-lock.
;; each category of keyword is given a particular face
(setq mml-font-lock-keywords
      (let* (
            ;; define several category of keywords
            (x-keywords '("#author" "#title" "#game" "#comment"))
            (x-types '("&" "?" "/" "=" "[" "]" "^" "<" ">"))
            (x-constants '("w" "t" "o" "@" "v" "y" "h" "q" "p" "n" "*" "!"))
            (x-events '("@" "@@" "ooo" "oooo"))
            (x-functions '("llAbs" "llAcos" "llAddToLandBanList" 
"llAddToLandPassList"))

            ;; generate regex string for each category of keywords
            (x-keywords-regexp (regexp-opt x-keywords 'words))
            (x-types-regexp (regexp-opt x-types 'words))
            (x-constants-regexp (regexp-opt x-constants 'words))
            (x-events-regexp (regexp-opt x-events 'words))
            (x-functions-regexp (regexp-opt x-functions 'words)))

        `(
          (,x-types-regexp . font-lock-type-face)
          (,x-constants-regexp . font-lock-constant-face)
          (,x-events-regexp . font-lock-builtin-face)
          (,x-functions-regexp . font-lock-function-name-face)
          (,x-keywords-regexp . font-lock-keyword-face)
          )))

;;;###autoload
(define-derived-mode mml-mode text-mode "mml mode"
  "Major mode for editing mml (Music Macro Language)"

  ;; code for syntax highlighting
  (setq font-lock-defaults '((mml-font-lock-keywords))))

;; add the mode to the `features' list
(provide 'mml-mode)

But now there are two problems: First, I have several keywords that start with a # (e.g. #author). But the # doesn't seem to work, because if I leave it out, it works.

(x-keywords '("#author")) does not work.

(x-keywords '("author")) works, but the # is not colored. The same problem also occurs with the @. Possibly also with others, but I'll try to get them working one by one.

second, a keyword seems to need at least two letters.

(x-keywords '("o")) does not work.

(x-keywords '("oo")) works.

But I have several "keywords" which are followed by only one letter and two (arbitrary) hex numbers (0-F) (e.g. o7D) How can I specify that these one letter keywords are found? (preferably together with the number, but no must).

1

1 Answers

0
votes

Both problems arise from the same issue: it has to do with the way you construct the regular expressions:

(regexp-opt x-blabla 'words)

The problem is the 'words parameter. What this does is to enclose the generated regular expression in a \< ... \> pair. According to the Emacs manual, these special character classes are defined as follows:

\<    
    matches the empty string, but only at the beginning of a word. 
    ‘\<’ matches at the beginning of the buffer only if a word-constituent
    character follows.

\>
    matches the empty string, but only at the end of a word. 
    ‘\>’ matches at the end of the buffer only if the contents end with a
    word-constituent character.

Now, what does "beginning of a word" mean to Emacs? That is mode-dependent. In fact, every major mode defines its own syntax-table which is a mapping of characters to syntax codes. There are a number of pre-defined classes, and one of them is "w" which defines a character as a word constituent. Normally, a text-based mode would define the letters a...z and A...Z to have the syntax code "w", but perhaps also other characters (e.g. a hyphen -).

Okay, back to the problem at hand. For, say x-keywords, the resulting x-keywords-regexp according to your definition is:

"\\<\\(#\\(?:author\\|comment\\|\\(?:gam\\|titl\\)e\\)\\)\\>"

(Note that inside strings, the backslash is a special character used to escape other special characters, e.g., \n or \t. So in order to encode a simple backslash itself, you have to quote it with another backslash.)

As discussed above, we see \< and \> (or, in string parlance: "\\<" and "\\>") at the beginning and the end of the regexp respectively. But, as we've just learned, in order for this regexp to match, both the first and the last character of the potential match need to have word-constituent syntax.

The letters are uncritical, but let's check the syntax code for # by typing C-h s:

The parent syntax table is:
C-@ .. C-h      .       which means: punctuation
TAB .. C-j              which means: whitespace
C-k             .       which means: punctuation
C-l .. RET              which means: whitespace
C-n .. C-_      .       which means: punctuation
SPC                     which means: whitespace
!               .       which means: punctuation
"               "       which means: string
#               .       which means: punctuation
...

(Obviously truncated.)

And there you have it! The # character does not have word constituent syntax, it is considered a punctuation.

But we can change that by putting the following line into the definition of your major-mode:

(modify-syntax-entry ?# "w" mml-mode-syntax-table)

?# is how chars are encoded in Emacs lisp (think '#' in C).

Regarding the second part of your question, in order to match something like o75, we'd have to do something similar: define all numbers to be word constituents:

(modify-syntax-entry '(?0 . ?9) "w" mml-mode-syntax-table)

However, we'd also need to write an appropriate regular expression to match such keywords. The regexp itself is not difficult:

"o[0-9A-F]\\{2\\}"

However, where to put that? Since it is already a regexp, we cannot simply add it to x-keywords because that is a list of simple strings.

However, we can concatenate it to x-keywords-regexp instead, by changing the respective line in your above code to read like this:

(x-keywords-regexp (concat (regexp-opt x-keywords 'words)
                           "\\|\\<[o][0-9A-F]\\{2\\}\\>"))

Note the "\\|" at the beginning of the string parameter, which is the regexp syntax for alternative matches.