Un-edited Live session – http://www.youtube.com/watch?v=owZOPO53dn4
Tony Bemus, Mat Enders, and Mary Tomich
Sound bites by Mike Tanner
Kernel News: Mat
mainline: 3.19-rc3 2015-01-06
stable: 3.18.2 2015-01-08
stable: 3.17.8 [EOL] 2015-01-08
longterm: 3.14.28 2015-01-08
longterm: 3.12.35 2014-12-10
longterm: 3.10.64 2015-01-08
longterm: 3.4.105 2014-12-01
longterm: 3.2.66 2015-01-01
longterm: 22.214.171.124 2014-12-13
linux-next: next-20150109 2015-01-09
Distro Talk: Tony
- 12-26 – Raspbian 2014-12-24
- 12-26 – KaOS 2014.12
- 12-28 – OpenELEC 5.0
- 12-31 – NixOS 14.12
- 12-31 -Deepin 2014.2
- 1-1 – ZevenOS 6.0
- 1-1 – Android-x86 4.4-r2
- 1-4 – SparkyLinux 3.6 “GameOver”
- 1-4 – ExTiX 15.1
- 1-6 – Openwall GNU/*/Linux 3.1
- 1-8 – Linux Mint 17.1 “KDE”
- 1-9 – Bio-Linux 8.0.5
Distro of the Week: Tony
- Android-x86 – 1184
- openSUSE – 1279
- Debian – 1484
- Ubuntu – 1565
- Mint – 2833
sort The Linux Command
The sort command is part of GNU coreutils. You can find out about other coreutils entering “info coreutils” at a command prompt.
You can the sample date files here:
sort does just what the name suggests, sorts the contents of a text file, line by line. If mutlitple files are given it will also merge, or compares all the lines from the given files. When no files are given as an argument it reads from stdin. By default sort writes the results to stdout.
sort has three different modes of operation. They are :
– sort = sort is a simple command but extremely useful which will rearrange the lines in a text file so that they are sorted, numerically and alphabetically. The default, rules for sorting are:
– lines that start with a number will sort before lines starting with a letter;
– lines starting with a letter that appears earlier in the alphabet will appear before lines starting with a letter that appears later in the alphabet;
– lines starting with a lowercase letter will appear before lines starting with the same letter in uppercase.
– check = Check whether the given file is already sorted: if it is not all sorted, print a diagnostic containing the first out-of-order line and exit with a status of 1. Otherwise, exit successfully. At most one input file can be given. You can also use it silently so it does not return the first out of order line.
– merge = Each input file must always be individually sorted. It always works to sort instead of merge; merging is provided because it is faster, in the case where it works.
The rules for sorting can be changed according to the options you provide to the sort command; these are listed below. You can specify them globally or in conjunction with a specific key field. In the event that no key fields are specified, any globaly set options will apply to the entire line; when a key field is specified the global options are inherited by key fields when they have no special options of their own. In all versions of ‘sort’ that are pre-POSIX, global options will only affect key fields set after the global options, to make shell scripts portable you should specify global options first.
Tell sort to ignore leading blanks when locating the sort keys in each line. The default seperator is a blank is a space or a tab, but whatever is set in you ‘LC_CTYPE’ locale can change this. Blanks may be ignored by your locale’s collating rules, but without this option they will be significant for character positions specified in keys with the ‘-k’ option.
This option tells sort which field or fields that consists of the part of the line between POS1 and POS2 (or the end of the line, if POS2 is omitted) inclusive.
Each POS has the form ‘F[.C][OPTS]’, where F is the number of the field to use, and C is the number of the first character from the beginning of the field. Contrary to other computer numbering these positions are numbered starting with 1; in POS2 a .0 indicates the field’s last character. If ‘.C’ is omitted from POS1, it defaults to 1 (the beginning of the field); if omitted from POS2, it defaults to 0 (the end of the field). OPTS are ordering options, allowing individual keys to be sorted according to different rules; see below for details. Keys can span multiple fields.
Example: To sort on the second field, use ‘–key=2,2’ (‘-k 2,2’). See below for more notes on keys and more examples. See also the ‘–debug’ option to help determine the part of the line being used in the sort.
Means to sort numerically. Since GNU sort follows POSIX standards thw -n option no longer implies -b, you must add the -b to get this behavior. It uses the first number it finds on each line and consists of optional blanks, an optional ‘-‘ sign, and zero or more digits possibly separated by thousands separators, optionally followed by a decimal-point character and zero or more digits. A non-number is treated as ‘0’. Whatever you have set in ‘LC_NUMERIC’ locale specifies the decimal-point character and thousands separator. By default a blank is a space or a tab, but what is set in ‘LC_CTYPE’ locale can change this.
Comparison is exact; there is no rounding error.
Neither a leading ‘+’ nor exponential notation is recognized. To compare such strings numerically, use the ‘–general-numeric-sort’ (‘-g’) option
Reverse the result of comparison, so that lines with greater key values appear earlier in the output instead of later.
Changes the SEPARATOR from the default to whatever character you choose as the field separator when finding the sort keys in each line. By default, fields are separated by the empty string between a non-blank character and a blank character. By default a blank is a space or a tab, but the ‘LC_CTYPE’ locale can change this.
That is, given the input line ‘ foo bar’, ‘sort’ breaks it into fields ‘ foo’ and ‘ bar’. The field separator is not considered to be part of either the field preceding or the field following, so with ‘sort -t ” “‘ the same input line has three fields: an empty field, ‘foo’, and ‘bar’. However, fields that extend to the end of the line, as ‘-k 2’, or fields consisting of a range, as ‘-k 2,3’, retain the field separators present between the endpoints of the range.
To specify ASCII NUL as the field separator, use the two-character string ‘\0’, e.g., ‘sort -t ‘\0”.
Here are some more examples:
* Sort in descending (reverse) numeric order. From the first character of the first field and extends to the end of each line.
sort -nr etc_passwd_1.txt
NOTE: For the large majority of applications, treating keys spanning more than one field as numeric will not do what you expect. to get the results you were probably looking for this is what you should have used sort -nrk3,3
* Sort alphabetically, omitting the first through the fourth fields and the blanks at the start of the fifth field. This starts at the first nonblank character in field three and extends to the end of each line.
sort -k5b etc_passwd_1.txt
NOTE: Whenever you span multiple fields with any sort you may not get the results you expect.
* Sort numerically on the second field and resolve ties by sorting alphabetically on the third and fourth characters of field five. Use ‘:’ as the field delimiter.
sort -t : -k 2,2n -k 5.3,5.4 etc_passwd_0.txt
Note that if you had written ‘-k 2n’ instead of ‘-k 2,2n’ ‘sort’ would have used all characters beginning in the second field and extending to the end of the line as the primary _numeric_ key. For the large majority of applications, treating keys spanning more than one field as numeric will not do what you expect.
Also note that the ‘n’ modifier was applied to the field-end specifier for the first key. It would have been equivalent to specify ‘-k 2n,2’ or ‘-k 2n,2n’. All modifiers except ‘b’ apply to the associated _field_, regardless of whether the modifier character is attached to the field-start and/or the field-end part of the key specifier.
* Sort the password file on the fifth field and ignore any leading blanks. Sort lines with equal values in field five on the numeric user ID in field three. Fields are separated by ‘:’.
sort -t : -k5b,5 -k3,3n /etc/passwd_0.txt
sort -t : -nk3b,3 -k4,4 /etc/passwd_0.txt
sort -t : -bk5,5 -k3,3n /etc/passwd_0.txt
These three commands have equivalent effect. The first specifies that the first key’s start position ignores leading blanks and the second key is sorted numerically. The other two commands rely on global options being inherited by sort keys that lack modifiers. The inheritance works in this case because ‘-k3b,3b’ and ‘-k3b,3’ are equivalent, as the location of a field-end lacking a ‘.C’ character position is not affected by whether initial blanks are skipped.
* Sort a log file, first by IPv4 address and second by time stamp. As long as the log file is a standerd Apache access log like this:
126.96.36.199 – – [10/Jan/2015:23:37:55 -0500] “GET /tmUnblock.cgi HTTP/1.1” 400 226 “-” “-”
188.8.131.52 – – [04/Jan/2015:01:33:41 -0500] “GET /tmUnblock.cgi HTTP/1.1” 400 226 “-” “-”
184.108.40.206 – – [01/Jan/2015:23:09:04 -0500] “GET /tmUnblock.cgi HTTP/1.1” 400 226 “-” “-”
220.127.116.11 – – [02/Jan/2015:17:23:26 -0500] “GET /tmUnblock.cgi HTTP/1.1” 400 226 “-” “-”
Fields are separated by exactly one space. Sort IPv4 addresses lexicographically, e.g., 18.104.22.168 sorts before 22.214.171.124 because 61 is less than 129.
sort -s -t ‘ ‘ -k 4.9n -k 4.5M -k 4.2n -k 4.14,4.21 access_log.txt |sort -s -t ‘.’ -k 1,1n -k 2,2n -k 3,3n -k 4,4n
This example cannot be done with a single ‘sort’ invocation, since IPv4 address components are separated by ‘.’ while dates come just after a space. So it is broken down into two invocations of ‘sort’: the first sorts by time stamp and the second by IPv4 address. The time stamp is sorted by year, then month, then day, and finally by hour-minute-second field, using ‘-k’ to isolate each field. Except for hour-minute-second there’s no need to specify the end of each key field, since the ‘n’ and ‘M’ modifiers sort based on leading prefixes that cannot cross field boundaries. The IPv4 addresses are sorted lexicographically. The second sort uses ‘-s’ so that ties in the primary key are broken by the secondary key; the first sort uses ‘-s’ so that the combination of the two sorts is stable.
show (at) smlr.us or 734-258-7009
This content is published under the Attribution-Noncommercial-Share Alike 3.0 Unported license.