Autore Topic: Elencare in ordine alfabetico secondo il modo umano  (Letto 270 volte)

Offline vuott

  • Moderatore globale
  • Senatore Gambero
  • *****
  • Post: 11.269
  • Ne mors quidem nos iunget
    • Mostra profilo
Elencare in ordine alfabetico secondo il modo umano
« il: 23 Ottobre 2013, 11:10:46 »
Vi riporto questa discussione apparsa nella Mailing List ufficiale:


" I've never been happy with the standard sorting algorithm when dealing with
lists of names. The human eye expects the names to be listed
alphabetically, overlooking spaces, hyphens, accented characters, ...

Assume the following names:
     - Benoizy
     - Benoît
     - Benï Lewis
     - Benoix
     - Ben Underwood

Sorting them using the default sort method results in the following list:
     - Ben Underwood
     - Benoix
     - Benoizy
     - Benoît
     - Benï Lewis

       Ben Underwood comes first due to the space having an UTF-8 value of

       Benoît comes last in the Benoi series of names due to the UTF-8
value of î which is ï
       Benï Lewis comes last of the list as ï has a UTF-8 value of î

Using the function Alfabet to sort the list, the end result using the
original strings appears in the form :
     - Benï Lewis
     - Benoît
     - Benoix
     - Benoizy
     - Ben Underwood

       which is the normal order a human expect to see when you ignore
spaces, accents, umlauts, ...

In annex
(vedi allegato) I sent a function I have written that strips a string from all the non-letter characters and returns a simple pure ascii string with all
characters in the range "a"-"z". Also included a small snapshot of an
actual list sorted by alphabet in one of my programs. As you notice, the
names are truly listed 'by Alphabet'

Feel free to use the function or maybe the concept could be incorporated in a future build of Gambas

Alain J. Baudrez
"


" Thanks for that interesting approach. I did a similar thing some years
ago to sort lists of (mainly German) names. But it is not all that easy
in every country.

You have to know that our Umlauts are sorted like vocal + "e", i. e. "ä"
= "ae" (which is its historical representation). And in office files (I
mean the paper ones) or telephone registers, we use tabs with "St" and
"Sch".

But beware: not all folks are doing it that way. The Swedish for
instance handle the umlauts as separate letters, i. e. they appear at
the end of the list. So in a Swedish dictionary, you will find a word
starting with an "ä" behind the words with "z". (And I would expect
"Hägar" to appear behind "Hazufel".)

My own algorithm sorts strictly the German way, whereas it does not
collect "St" and "Sch".

Rolf
"
« Chiunque, non ricorrendo lo stato di necessità, nel proprio progetto Gambas fa uso delle istruzioni Shell o Exec, è punito con la sanzione pecuniaria da euro 20,00 a euro 60,00. »