4
votes

I'm having trouble sorting Norwegian text column in Postgres. My environment:

db=# select version();
PostgreSQL 9.2.14 on x86_64-redhat-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-4), 64-bit 

the database is hosted on RedHat Openshift.

when running >locale command I get:

  • locale: Cannot set LC_CTYPE to default locale: No such file or directory
  • locale: Cannot set LC_ALL to default locale: No such file or directory
  • LANG=en_US.UTF-8
  • LC_CTYPE=UTF-8
  • LC_NUMERIC="en_US.UTF-8"
  • LC_TIME="en_US.UTF-8"
  • LC_COLLATE="en_US.UTF-8"
  • LC_MONETARY="en_US.UTF-8"
  • LC_MESSAGES="en_US.UTF-8"
  • LC_PAPER="en_US.UTF-8"
  • LC_NAME="en_US.UTF-8"
  • LC_ADDRESS="en_US.UTF-8"
  • LC_TELEPHONE="en_US.UTF-8"
  • LC_MEASUREMENT="en_US.UTF-8"
  • LC_IDENTIFICATION="en_US.UTF-8"
  • LC_ALL=

**edit

db=#\l
          Name           |    Owner     | Encoding |   Collate   |    Ctype    |   Access privileges   
-------------------------+--------------+----------+-------------+-------------+-----------------------
 db                      | myadminUser  | UTF8     | en_US.UTF-8 | en_US.UTF-8 | 

this is what I tried, this sql shows the default sort is incorrect:

db=# select * from unnest(ARRAY['a','b','c','d','A','B','C','å','ø','z','Z','Ø']) as t1 order by t1;

result: a A å b B c C d ø Ø z Z

(I think this sort order is even wrong for English, Capital 'A' should come before 'a' isn't it ?)

then I tried :

db=# CREATE COLLATION nor (LOCALE = 'nn_NO.utf8');

and the same statement from before:

db=# select * from unnest(ARRAY['a','b','c','d','A','B','C','å','ø','z','Z','Ø']) as t1 order by t1 collate nor;

Now the result is: A a B b C c d Z z Ø ø å

This looked really good, I thought I'm done.. BUT then I tried:

db=# select * from unnest(ARRAY['aaaa','bbbb','cccc','dddd','AAAA','BBBB','CCCC','åååå','øøøø','zzzz','ZZZZ','ØØØØ']) as t1 order by t1 collate nor;

result: BBBB bbbb CCCC cccc dddd ZZZZ zzzz ØØØØ øøøø AAAA aaaa åååå

what am I doing wrong here ?

1

1 Answers

2
votes

The order is correct. In Norwegian "aa" is the spelling of "å" and it should go at the end.

Source: https://en.wikipedia.org/wiki/%C3%85

Correct alphabetization in Danish and Norwegian places Å as the last letter in the alphabet, the sequence being Æ, Ø, Å. This is also true for the alternative spelling "Aa". Unless manually corrected, sorting algorithms of programs localised for Danish or Norwegian will place e.g., Aaron after Zorro.