Regarding tools used – I am using JS Bin. If you’re unfamiliar with this tool – check out http://jsbin.com/help/what-is-jsbin. The purpose of this article is to both answer the question and show an approach of how to find the answer.
All snarkiness aside, programming languages are just like board games. Maybe you don’t like the rules of Pandemic, and that’s completely fine. However, your rules of the game won’t be the “standard” of this particular game unless there is lots of adoption and acceptance by other people. There is a vast graveyard of board games that never became popular in the same way that there is a graveyard of programming languages that never took off.
So the question that I got from a student was the following:
Small question on comparing strings. Ordering of strings is based on Unicode, got it. If the first letter is equal, are the strings equal? Or does it go letter by letter until it finds inequality? For example, is “made” i and therefore “made” > “maid” ?
The Answer or better said an answer
Zakas on String Comparisons
The following excerpt comes from Chapter 3.
So the gist here is that:
- Don’t use your human logic to associate capital letters as being “bigger” than lowercase letters
- All strings boil down to character codes
- Comparison starts with the first letter of each string and goes from there (so there’s no cumulative addition of strings on each side)
How do we figure out a strings character code?
Looking for 'character code' in Zakas's book yields the following.
What exactly does charCodeAt do?
Interesting – we're dealing with the part of Unicode that is represented in UTF-16 (before things get a bit more complicated). Recall that the student (in his question) had assumed Unicode.
Lets go to JS Bin
JS Bin – 1
If we evaluate according to Zakas, then the comparison starts at 'd' and 'i' since 'm' and 'a' are the same on both the left and right side of the expression.
JS Bin – 2
Since charCodeAt provides the UTF 16 code unit at a specific part of a string. Giving it one character with or without an index results in the same thing.
Now in the next steps we could put the full string (i.e. "made") and then pick a specific index, but I rather keep it simple and have a laser focus on what we're trying to answer.
JS Bin – 3
Comparing the differing letter for each word – we can see that in the first set ("d" and "i") – 100 is less than 105 so up to this point the answer would be true when asking the question 'is "made" less than "maid".
For the last letters ("e" and "d") the answer is still the same even though "e" is a higher value than "d". What's going on here? Well the comparison stops at the previous set of letters, so this comparison has no effect.
JS Bin – 4
What about "mad" versus "made"? Is "mad" less than "made"?
You can find the JS Bin at http://jsbin.com/gopacuv/edit?js,console
We could do much more at this point. For example, we could create a function that gets two strings, then iterates through each one comparing the character codes for each letter and returning the less than comparison based on this evaluation. We could see if the String's localCompare would do a better job in terms of character by character comparison. We could do lots of things, but the goal of this article was to explain string comparison, and this has been done.