4
votes

Usual URL shortening techniques use few characters of the usual URL-charset, because not need more. Typical short URL is http://domain/code, where code is a integer number. Suppose that I can use any base (base10, base16, base36, base62, etc.) to represent the number.

QR Code have many encoding modes, and we can optimize the QR Code (minimal version to obtain lowest density), so we can test pairs of baseX-modeY...

What is the best base-mode pair?


NOTES

A guess...

Two modes fit with the "URL shortening profile",

  • 0010 - Alphanumeric encoding (11 bits per 2 characters)
  • 0100- Byte encoding (8 bits per character)

My choice was "upper case base36" and Alphanumeric (that also encodes "/", ":", etc.), but not see any demonstration that it is always (for any URL-length) the best. There are some good Guide or Mathematical demonstration about this kind of optimization?

The ideal (perhaps impracticable)

There are another variation, "encoding modes can be mixed as needed within a QR symbol" (Wikipedia)... So, we can use also

  • HTTP://DOMAIN/ with Alphanumeric + change_mode + Numeric encoding (10 bits per 3 digits)

For long URLs (long integers), of course, this is the best solution (!), because use all charset, no loose... Is it?

The problem is that this kind of optimization (mixed mode) is not accessible in usual QRCode-image generators... it is practicable? There are one generator using correctally?

An alternative answer format

The (practicable) question is about best combination of base and mode, so we can express it as a (eg. Javascript) function,

 function bestBaseMode(domain,number_range) {
    var dom_len = domain.length;
    var urlBase_len = dom_len+8; // 8 = "http://".length + "/".length;
    var num_min = number_range[0];
    var num_max = number_range[1];
    // ... check optimal base and mode
    return [base,mode];
 }

Example-1: the domain is "bit.ly" and the code is a ISO3166-1-numeric country-code, ranging from 4 to 894. So urlBase_len=14, num_min=4 and num_max=894.

Example-2: the domain is "postcode-resolver.org" and number_range parameter is the range of most frequent postal codes integer representations, for instance a statistically inferred range from ~999 to ~999999. So urlBase_len=27, num_min=999 and num_max=9999999.

Example-3: the domain is "my-example3.net" and number_range a double SHA-1 code, so a fixed length code with 40 bytes (2 concatenated hexadecimal 40 digits long numbers). So num_max=num_min=Math.pow(8,40).

1
Please comment your close or down vote, this question is about optimization, and not duplicates this other question (that is not a base-mode question, and is confuse).Peter Krauss

1 Answers

0
votes

Nobody want my bounty... I lost it, and now also need to do the work by myself ;-)


about the ideal

The goQR.me support reply the particular question about mixed encoding remembering that, unfortunately, it can't be used,

sorry, our api does not support mixed qr code encoding. Even the standard may defined it. Real world QR code scanner apps on mobile phone have tons of bugs, we would not recommend to rely on this feature.

functional answer

This function show the answers in the console... It is a simplification and "brute force" solution.

 /**
  * Find the best base-mode pair for a short URL template as QR-Code.
  * @param Msg for debug or report.
  * @param domain the string of the internet domain
  * @param digits10 the max. number of digits in a decimal representation
  * @return array of objects with equivalent valid answers.
  */
 function bestBaseMode(msg,  domain,digits10) {
    var commomBases= [2,8,10,16,36,60,62,64,124,248];  // your config
    var dom_len = domain.length;
    var urlBase_len = dom_len+8; // 8 = "http://".length + "/".length
    var numb = parseFloat( "9".repeat(digits10) );  
    var scores = [];
    var best = 99999;
    for(i in commomBases) {
        var b  = commomBases[i];
        // formula at http://math.stackexchange.com/a/335063
        var digits = Math.floor(Math.log(numb) / Math.log(b)) + 1;
        var mode = 'alpha';
        var len = dom_len + digits;
        var lost = 0;
        if (b>36) {
            mode = 'byte';
            lost = parseInt( urlBase_len*0.25); // only 6 of 8 bits used at URL
        }
        var score = len+lost; // penalty
        scores.push({BASE:b,MODE:mode,digits:digits,score:score});
        if (score<best) best = score;
    }
    var r = [];
    for(i in scores) {
        if (scores[i].score==best) r.push(scores[i]);
    }
    return r;
}

Running the question examples:

var x = bestBaseMode("Example-1",   "bit.ly",3);
console.log(JSON.stringify(x))   // "BASE":36,"MODE":"alpha","digits":2,"score":8

var x = bestBaseMode("Example-2",   "postcode-resolver.org",7);
console.log(JSON.stringify(x))  // "BASE":36,"MODE":"alpha","digits":5,"score":26

var x = bestBaseMode("Example-3",  "my-example3.net",97);
console.log(JSON.stringify(x))  // "BASE":248,"MODE":"byte","digits":41,"score":61