Skip to main content

regex Part 1 - Data Manipulation

 

LPI E - Data Manipulation

3.2 Searching and Extracting Data from Files

Part 1 of 2
Part 2:  Regular Expressions (regex)



In this blog post, we will explore some of the most common commands that are used to search and extract data from files in Linux. Specifically, we will be discussing 

  • grep
  • less
  • cat
  • head
  • tail
  • sort
  • cut
  • wc
The blog will finish by showing how to use the data manipulation tools for I/O Redirection and giving several examples for Regular Expressions (regex) to use when analyzing text within the Linux Terminal.

grep Command

The grep command is used to search for patterns in a file or set of files. The syntax for using grep is as follows:

$ grep pattern filename

For example, if we want to search for the word "apple" in a file called "fruits.txt", we can use the following command:

$ grep apple fruits.txt

This will display all the lines in the file that contain the word "apple".

The grep command is also capable of using regular expressions to search for patterns. For example, if we want to search for all words that start with the letter "a", we can use the following command:

$ grep '^a' fruits.txt

Here, the "^" character denotes the beginning of a line and "a" is the character we want to search for.

Exam Note:
The grep command can be used to search for text in files and is essential in shell scripting.

less Command

The less command is used to view the contents of a file. It is especially useful when the file is too large to view with the cat command. The syntax for using less is as follows:

$ less filename

For example, if we want to view the contents of a file called "largefile.txt", we can use the following command:

$ less largefile.txt

This will open the file in a viewer that allows us to navigate through the contents of the file.

Exam Note:
The less command is used to view the contents of a file.

cat Command

The cat command is used to display the contents of a file. The syntax for using cat is as follows:

$ cat filename

For example, if we want to display the contents of a file called "fruits.txt", we can use the following command:

$ cat fruits.txt

This will display the contents of the file on the terminal.

Exam Note:
The cat command is used to display the contents of a file.

head Command

The head command is used to display the first few lines of a file. The syntax for using head is as follows:

$ head filename

By default, head displays the first 10 lines of a file. However, we can specify the number of lines we want to display using the "-n" option. For example, if we want to display the first 5 lines of a file called "fruits.txt", we can use the following command:

$ head -n 5 fruits.txt

This will display the first 5 lines of the file on the terminal.

Exam Note:
The head command is used to display the first few lines of a file.

tail Command

The tail command is used to display the last few lines of a file. The syntax for using tail is as follows:

$ tail filename

By default, tail displays the last 10 lines of a file. However, we can specify the number of lines we want to display using the "-n" option. For example, if we want to display the last 5 lines of a file called "fruits.txt", we can use the following command:

$ tail -n 5 fruits.txt

This will display the last 5 lines of the file on the terminal.

Exam Note:
The tail command is used to display the last few lines of a file.

sort Command

The sort command is used to sort the contents of a file. The syntax for using sort is as follows:

$ sort filename

By default, sort sorts the contents of the file in ascending order. However, we can use the "-r" option to sort the contents in descending order. For example, if we want to sort the contents of a file called "numbers.txt" in descending order, we can use the following command:

$ sort -r numbers.txt

This will display the contents of the file sorted in descending order on the terminal.

Exam Note:
The sort command is used to sort the contents of a file.

cut Command

The cut command is used to extract specific columns from a file. The syntax for using cut is as follows:

$ cut -d delimiter -f field filename

Here, "delimiter" is the character that separates the columns in the file, "field" is the column we want to extract, and "filename" is the name of the file we want to extract the column from.

For example, if we have a file called "grades.csv" that contains the following data:

Name,Maths,Science,English
John,90,80,70
Jane,80,85,90
Bob,70,75,80

If we want to extract the "Science" column from the file, we can use the following command:

$ cut -d ',' -f 3 grades.csv

This will display the "Science" column on the terminal.

Exam Note:
The cut command is used to extract specific columns from a file.

wc Command

The wc command is used to count the number of lines, words, and characters in a file. The syntax for using wc is as follows:

$ wc filename

By default, wc displays the number of lines, words, and characters in the file. However, we can use the "-l", "-w", and "-c" options to display only the number of lines, words, or characters respectively. 

For example, if we want to count the number of lines in a file called "fruits.txt", we can use the following command:

$ wc -l fruits.txt

This will display the number of lines in the file on the terminal.

Exam Note:
The wc command is used to count the number of lines, words, and characters in a file.

I/O Redirection

I/O redirection is a powerful technique in Linux that allows us to redirect the input and output of a command from and to a file or another command. This technique allows us to manipulate the flow of data in Linux and perform more complex operations.

Here are some examples of I/O redirection

    Redirecting the output of a command to a file:

We can redirect the output of a command to a file using the ">" operator. For example, if we want to redirect the output of the "ls" command to a file called "files.txt", we can use the following command:

$ ls > files.txt

This will redirect the output of the "ls" command to the file "files.txt". If the file already exists, its contents will be overwritten.

    Redirecting the input of a command from a file:

We can redirect the input of a command from a file using the "<" operator. For example, if we want to search for the word "apple" in a file called "search.txt", we can use the following command:

$ grep apple < search.txt

This will redirect the input of the "grep" command from the file "search.txt".

    Using pipes to redirect the output of one command to the input of another command:

We can use pipes to redirect the output of one command to the input of another command. Pipes are represented by the "|" operator. For example, if we want to list the contents of the "/usr/bin" directory and then count the number of lines in the output, we can use the following command

$ ls /usr/bin | wc -l

This will list the contents of the "/usr/bin" directory and then count the number of lines in the output.

Regex continued here:
https://www.certificationmethods.com/2024/06/regex-searching.html