Skip to main content

Experiment 05

Aim

To perform advanced text processing, data filtering, file comparisons, and input/output redirection in Linux

Theory

  • cmp (Compare)

    • Compares two files and tells you which line numbers are different
    • Syntax: cmp [options..] file1 file2.
  • paste (Paste)

    • Used to paste the content from one file to another file. It is also used to set column format for each line
    • Syntax: paste [options]
    • Options: -d reuse characters from LIST instead of TABS
  • grep (Global Regular Expression Print)

    • Searches for a specific string or pattern within a file's contents
    • Syntax: grep "[pattern]" [filename]
  • sort (Sort) & uniq (Unique)

    • sort arranges lines alphabetically or numerically; uniq filters out adjacent duplicate lines
    • Syntax: sort [filename] / uniq [filename]
  • sed (Stream Editor) & awk (AWK)

    • sed is a stream editor used for finding and replacing text (s/old/new/g); awk is a pattern scanning language used for column extraction
  • Redirection (>, >>) & Piping (|)

    • > routes output to a file (overwriting), >> appends to it. The pipe | takes the output of the first command and uses it as the input for the second command

Commands

$ echo -e "apple\nbanana\norange" > list1.txt
$ echo -e "apple\ngrape\norange" > list2.txt

$ cmp list1.txt list2.txt
list1.txt list2.txt differ: byte 7, line 2

$ paste -d "," list1.txt list2.txt > combined.csv
$ cat combined.csv
apple,apple
banana,grape
orange,orange

$ echo -e "dog\ncat\ndog\nbird" > animals.txt
$ sort animals.txt | uniq > unique_animals.txt
$ cat unique_animals.txt
bird
cat
dog

$ grep "cat" unique_animals.txt
cat

$ sed 's/dog/wolf/g' animals.txt
wolf
cat
wolf
bird

Conclusion

Advanced command-line text parsing and manipulation were successfully achieved. Files were compared (cmp), merged side-by-side (paste), filtered (grep), sorted (sort, uniq), and modified in-stream (sed), demonstrating the powerful data pipeline capabilities of the Linux shell