2
votes

So hexadecimal is represented like this: 0x[0..F]+ And decimal integers are represented like this: [0..9]+

So for the decimal number 15, the hex version is 0xF which is one character longer. Obviously this is only because you have to add the 0x, but that is a necessary part of writing hex literals.

However, at large values, hex uses fewer characters than decimal as it is base 16, rather than base 10.

E.g.

0xFFFFFFFFFFFFFFF

is shorter than

1152921504606846975

At what point does hex become shorter than decimal? Is there a nice little algorthm that calculates this number?

I have tagged this as an interview question even though it isn't. I think it would make a good one.

3
That's a simple linear relationship that you ought to be able to solve in two or three lines. The slope is the ratio of the logarithm of the bases.Kerrek SB
Hexadecmal numbers do not start with a 0x, thats just so your complier/interpreter can identify them as Hex, in Delphi its $, in CSS its #Dampsquid
Yes, I realise that. However it is the normal convention to put 0x before hex numbers. For my purposes, it's required.Nick Brunt

3 Answers

8
votes

The number of digits in some base B necessary to represent a number n is given by

⌈logB(n + 1)⌉

This means that hexadecimal numbers are more efficient than decimal numbers whenever

⌈log16(n + 1)⌉ < ⌈log10(n + 1)⌉

If you have some number n to test, you can just plug it into this formula to see whether hexadecimal or decimal will be more efficient.

But let's see if we can plot out the ranges where you need some number of digits to represent something. We get this table here:

Num Digits         Decimal Cutoff         Hex Cutoff
----------------------------------------------------
    1                   0                    0
    2                  10                   16
    3                 100                  256
    4                1000                 4096
    5               10000                65536
    6              100000              1048576
    7             1000000             16777216

Notice that when we hit six decimal digits, it's never more efficient to write the number in decimal, since a decimal number with six digits is at most 999999 and a hexadecimal number with six digits is at most 16777215. So from 100000 onward, you are better off writing the number in hexadecimal than in decimal.

EDIT: Because you are counting the 0x characters as part of the total number of digits required, you're going to be looking for the first number where

⌈log16(n + 1)⌉ + 2 < ⌈log10(n + 1)⌉

In this case, the table looks like this:

Num Digits         Decimal Cutoff         Hex Cutoff
----------------------------------------------------
    0                 100                    1
    1                1000                   16
    2               10000                  256
    3              100000                 4096
    4             1000000                65536
    5            10000000              1048576
    6           100000000             16777216
    7          1000000000            268435456
    8         10000000000           4294967296
    9        100000000000          68719476736
   10       1000000000000        1099511627776
   11      10000000000000       17592186044416

In this case, the point at which the two representations meet is 1099511627776, which requires only 11 hex digits but 13 decimal digits. From this point forward, you are always at least as well off using hexadecimal. Once you hit 10000000000000, you are strictly better off using hex.

Hope this helps!

5
votes

The answer bounces back and forth for quite a while because the powers of 10 and 16 aren't aligned. So from a power of 10 to the next power of 16 hex number strings will temporarily be the same length or shorter depending on the magnitude. Here's a table that lists where hex is shorter:

Hex is as efficient from 1000000 to 1048575
Hex is as efficient from 10000000 to 16777215
Hex is as efficient from 100000000 to 268435455
Hex is as efficient from 1000000000 to 4294967295
Hex is as efficient from 10000000000 to 68719476735
Hex is as efficient from 100000000000 to 999999999999
Hex is more efficient from 1000000000000 to 1099511627775
Hex is as efficient from 1099511627776 to 9999999999999
Hex is more efficient from 10000000000000 to 17592186044415
Hex is as efficient from 17592186044416 to 99999999999999
Hex is more efficient from 100000000000000 to 281474976710655
Hex is as efficient from 281474976710656 to 999999999999999
Hex is more efficient from 1000000000000000 to 4503599627370495
Hex is as efficient from 4503599627370496 to 9999999999999999
Hex is more efficient from 10000000000000000 to 72057594037927935
Hex is as efficient from 72057594037927936 to 99999999999999999
Hex is more efficient from 100000000000000000 to Infinity

The Python script that produced the above is below. It works by building a list of ranges at each of the powers which are the only values for which the lengths can change. Max limits the search to 10^max. Also note that translate is included to strip off an L that Python includes in hex strings above a certain number. Note that integers only have that appended in their repr form. This Pythonic quirk could be avoided by comparing ceilings of logs instead.

import math
max=25
hmax=int(math.log(10,16)*max)
f={-1:'less',0:'as',1:'more'}
r=[]
for i in sorted(map(lambda x:10**x,range(1,max))+map(lambda x:16**x,range(1,hmax))):
    r.append([cmp(len(str(i)),len(hex(i).translate(None,'L'))),i])
i=0
while i<len(r): 
    j=1
    while (i+j)<len(r) and r[i+j][0]==r[i][0]:j+=1
    end=r[i+j][1]-1 if len(r)>(i+j) else 'Infinity'
    if r[i][0]!=-1: print 'Hex is {0} efficient from {1} to {2}'.format(
        f[r[i][0]],r[i][1],end)
    i+=j
0
votes

A similar answer as to those given, only slightly more redundantly written for readability with different outputs, tested under Python 3.6 :

import math

def cmp(a, b):
    return (a > b) - (a < b)


max=25
hmax=int(math.log(10,16)*max)+1
r=[]

# create mixed list of powers to 10/16 base

cblist1 = list(map(lambda x:10**x,range(1,max)))
cblist2 = list(map(lambda x:16**x,range(1,hmax)))
cblist3 = sorted(cblist1+cblist2) 

# create vector of len-comparison results

for i in cblist3:
    #print(str(i-1) +' ' +hex(i-1)[2:])

    r.append(cmp(len(str(i-1)),2+len(hex(i-1)[2:]))) 

    # one can delete the "2+" above for 0x-less calculation

# filter indices for relevant efficiencies for Hex-notation

same_eff = [i for i, e in enumerate(r) if e == 0]
less_eff = [i for i, e in enumerate(r) if e == -1]
more_eff = [i for i, e in enumerate(r) if e == 1]

# convert indices to relevant ranges in DEC and HEX

same_eff_rd = [[cblist3[x-1],cblist3[x]-1] for x in list(same_eff)]
same_eff_rh =[[hex(cblist3[x-1]),hex(cblist3[x]-1)] for x in list(same_eff)]

more_eff_rd = [[cblist3[y-1],cblist3[y]-1] for y in more_eff]
more_eff_rh =[[hex(cblist3[y-1]),hex(cblist3[y]-1)] for y in more_eff]

Now we can print the summary results:

print('\n')
print('When adding the 0x prefix, Hex-notation becomes first as efficient for the DEC-range of [{0},{1}]'.format(*same_eff_rd[0]))
print('This translates into the HEX-range of [{0},{1}]'.format(*same_eff_rh[0]))
print('\n')
print('When adding the 0x prefix, Hex-notation becomes first more or at least as efficient, as of the DEC-range of [{0},{1}] and onwards'.format(*more_eff_rd[0]))
print('This translates into the HEX-range of [{0},{1}]'.format(*more_eff_rh[0]))

Or all the ranges:

print('\n')
print('The "more efficient" ranges are in HEX: ')
print(list(more_eff_rh))
print('\n')
print('\n')
print('The "as efficient" ranges are in HEX: ')
print(list(same_eff_rh))