Linux 101
Processing Text Streams
Regular expressions
language for expressing patterns in text
special strings that define search patterns
text
REGEX All patterns in text matching regex
ENGINE
regex
Find all email addresses in a document
regex matches string = the string has the same structure as defined by the regexp
normal characters
REGEX = metacharacters represent patterns
escape character interpret metacharacters as normal ones
[Link]
Regular expressions
metacharacters
. any character escape character
repetitions \
* zero or more times
? zero or one time
+ one or more times
{n,m} minimum n and maximum m times
| or
groups and ranges
[aeiou] character set matches any vowel
[^aeiou] any character not in the set matches any consonant
[a-z] character range matches entire lowercase alphabet
() grouping
anchors
^ start of line ^[0-9]{3}$
$ end of line 000...999 on a single line
\b word boundaries
Using grep
grep Global Regular Expression Print. Print lines matching a pattern
-E, --extended-regexp (same as egrep)
-c, --count count matching lines
-f <file>, --file=<file take pattern from file
-i, --ignore-case
-r, --recursive search directories recursively
(same as rgrep)
grep [options] regexp [files]
TIP: quote the regexp to avoid shell expansion “regexp “
Using sed
sed stream editor
-n, --quiet, --silent don’t print lines automatically
-e <script>, --expression=<script> add script to the commands
-f <script_file>, --file=<script_file> read commands from <script_file>
sed [options] script [file]
command grouping commands
line _restriction command { }
3 apply command to line 3 s/pattern/replacement/flags
2,15 all lines between 2 and 15 substitute
/pattern/ all lines matching pattern p print line
/pattern1/, /pattern2/ d delete line
w file write to file
negate restriction q quit
!
Using Filters
COLUMN / FIELD
processing join [opts] file1 file2
tr [opts] set1 [set2]
cut, paste
LINE join
processing expand, tr CHARACTER
unexpand processing
FILE
head, tail, nl
cat, tac processing
sort, uniq
split sed
uniq [opts] [in [out]] wc
split [opts] [file [prefix]] od, pr, fmt
FILE
PRINT statistics
formatting
command [opts] [file] …
Using Filters
head output the beginning (default 10 lines) of the file
-c <num>, --bytes=<num>
-n <num>, --lines=<num>
tail output the end (default 10 lines) of the file
-f, --follow
--pid=<pid> terminate following when <pid> terminates
sort order lines lexicographically (or by a field)
-f, --ignore-case
-n, --numeric-sort sort numerically
-r, --reverse
-k <field>, --key=<field> field to sort by (default first)
uniq discard duplicate lines
-u show only unique lines uniq [opts] [in [out]]
-d show only duplicate lines
-c count occurrences
nl number lines in the output a – all t – non blank
-h <style>, -b <style>, -f <style> n – no number
-n <format>, --number-format=<format>
-i line increment ln rn rz
Using Filters
cut extract sections (columns) from each line
-b <list>, --bytes=<list>
-c <list>, --characters=<list>
-f <list>, --fields=<list>
-d <char>, --delimiter=<char> (default tab)
-s, --only-delimited
paste merge files line by line
-d <list>, --delimiters=<list>
-s --serial put each file on a line
default delimiter is TAB
default delimiter is space
join combines two files by matching fields
-t <char> field separator
-i ignore case
-1 n, -2 n specify join field number
join [opts] file1 file2
Using Filters
expand convert tabs to spaces
-t <num>, --tabs <nums> modify spacing of tabs (default 8)
unexpand convert spaces to tabs
tr translate characters
ABC
-t, --truncate-set1 1-9 = 123456789
-d deletes characters from set 1
tr [opts] set1 [set2]
$echo “lower to upper case” | tr “a-z” “A-Z”
LOWER TO UPPER CASE
wc word count – counts lines, words and bytes
-l, --lines -w, --words
-c, --bytes -m, --chars
-L, --max-line-length
Using Filters
cat concatenate files to the output
-E, --show-ends put a $ at the end of each line
-n, --number add line numbers
-b, --number-nonblank numbers only nonblank lines
-s, --squeeze-blank compresses more blanks lines into a single one
-T, --show-tabs display tab chars as ^I
-v, --show-nonprinting displays control chars as (e.g. ^M)
tac concatenate and reverse order of lines in each file
split break a single file into multiple parts
-b <size>, --bytes=<size> default prefix: x
-C <size>, --line-bytes=<size>
-l <lines>, --lines=<lines> default suffixes: aa, ab, ac …
-d, --numeric-suffixes
split [opts] [file [prefix]]
Using Filters
pr prepare a file for printing
-l <lines>, --length=<lines> set page length
-h <text>, --header=<text> set header text
-o <chars>, --indent=<chars> set left margin
-w <chars>, --width=<chars> set page width
fmt format paragraphs
-<width>, -w <width>, --width=<width> (default 75)
-t, --tagged-paragraph indentation first line
od (octal dump) display files in octal or other formats
-t <type>, --format=<type>
-w <width>, --width=<width> output <width> bytes per line
TYPE
d2 – decimal shorts, d4 – decimal longs
x2 – hexadecimal shorts, x4 – hexadecimal longs
o2 – octal shorts (default), o4 – octal longs
Vi editor
Operation modes
Command mode Ex mode Insert mode
default colon commands
w, b = forward, bakward one word
h, j, k, l = Left, Down, Up, Right ^, $ = start, end of line
precede with number to multiply command Commands that enter insert mode
d delete i insert before the cursor
dw delete word I insert at line start
dd delete line a append after the cursor
A append at the end of line
y, yw, yy yank (copy) o open line after cursor
c, cw, cc change O open line before cursor
p paste after cursor r replace character
P paste before cursor R replace to the end of line
:w save
/ forward search
:q quit
? reverse search
:wq, ZZ save & quit