20
votes

I'm looking for the maximum character length allowed for an internet Message-ID field for validation purposes within an application. I've reviewed sources such as RFC-2822 and Wikipedia "Message-ID" as well as this SO question, among other various places. The closest answer I can find is "998 characters" because that is the maximum length that the specification allows for each line in an internet message (from RFC-2822), and the Message-ID field cannot be multiple lines.

Is 998 characters the definitive answer? Is there no such limit?

2

2 Answers

17
votes

If there's one thing I've learned about email, it must be that it's a massively distributed system for fuzzing email software. That is, no matter what the RFCs say, you will find emails violating them, some email software coping and some failing. I think most will limp along with the robustness principle in mind.

With that out of the way, I think the maximum RFC compliant Message-ID length is 995 characters.

The maximum line length per the RFC you cite is 998 characters. That would include the "Message-ID:" field name, but you can do line folding between the field name and the field body. The line containing the actual Message-ID would then contain a space (the folding whitespace), "<", Message-ID, and ">". Semantically, the angle brackets are not part of the Message-ID. Therefore you end up with a maximum of 998 - 3 = 995 characters.

3
votes

Actually there's no limit

RFC2822 defines these productions:

message-id      =       "Message-ID:" msg-id CRLF

msg-id          =       [CFWS] "<" id-left "@" id-right ">" [CFWS]

id-left         =       dot-atom-text / no-fold-quote / obs-id-left

obs-id-left     =       local-part

local-part      =       dot-atom / quoted-string / obs-local-part

quoted-string   =       [CFWS]
                        DQUOTE *([FWS] qcontent) [FWS] DQUOTE
                        [CFWS]

CFWS            =       *([FWS] comment) (([FWS] comment) / FWS)

FWS             =       ([*WSP CRLF] 1*WSP) /   ; Folding white space

So id-left can be local-part which can be quoted-string (and thus have multiple FWS) so you can fold it as many times as needed to fit any arbitrary length of payload and still comply with the restrictions given by the RFC.