LPI E - Data Manipulation
3.2 Searching and Extracting Data from Files
Part 1 of 2
Part 2: Regular Expressions (regex)
In this blog post, we will explore some of the most common commands that are used to search and extract data from files in Linux. Specifically, we will be discussing
- grep
- less
- cat
- head
- tail
- sort
- cut
- wc
grep Command
The grep command is used to search for patterns in a file or set of files. The syntax for using grep is as follows:
$ grep pattern filename
For example, if we want to search for the word "apple" in a file called "fruits.txt", we can use the following command:
$ grep apple fruits.txt
This will display all the lines in the file that contain the word "apple".
The grep command is also capable of using regular expressions to search for patterns. For example, if we want to search for all words that start with the letter "a", we can use the following command:
$ grep '^a' fruits.txt
Here, the "^" character denotes the beginning of a line and "a" is the character we want to search for.
Exam Note:
The grep command can be used to search for text in files and is essential in shell scripting.
less Command
The less command is used to view the contents of a file. It is especially useful when the file is too large to view with the cat command. The syntax for using less is as follows:
$ less filename
For example, if we want to view the contents of a file called "largefile.txt", we can use the following command:
$ less largefile.txt
This will open the file in a viewer that allows us to navigate through the contents of the file.
Exam Note:
The less command is used to view the contents of a file.
cat Command
The cat command is used to display the contents of a file. The syntax for using cat is as follows:
$ cat filename
For example, if we want to display the contents of a file called "fruits.txt", we can use the following command:
$ cat fruits.txt
This will display the contents of the file on the terminal.
Exam Note:
The cat command is used to display the contents of a file.
head Command
The head command is used to display the first few lines of a file. The syntax for using head is as follows:
$ head filename
By default, head displays the first 10 lines of a file. However, we can specify the number of lines we want to display using the "-n" option. For example, if we want to display the first 5 lines of a file called "fruits.txt", we can use the following command:
$ head -n 5 fruits.txt
This will display the first 5 lines of the file on the terminal.
Exam Note:
The head command is used to display the first few lines of a file.
tail Command
$ tail filename
By default, tail displays the last 10 lines of a file. However, we can specify the number of lines we want to display using the "-n" option. For example, if we want to display the last 5 lines of a file called "fruits.txt", we can use the following command:
$ tail -n 5 fruits.txt
This will display the last 5 lines of the file on the terminal.
Exam Note:
The tail command is used to display the last few lines of a file.
sort Command
The sort command is used to sort the contents of a file. The syntax for using sort is as follows:
$ sort filename
By default, sort sorts the contents of the file in ascending order. However, we can use the "-r" option to sort the contents in descending order. For example, if we want to sort the contents of a file called "numbers.txt" in descending order, we can use the following command:
$ sort -r numbers.txt
This will display the contents of the file sorted in descending order on the terminal.
Exam Note:
The sort command is used to sort the contents of a file.
cut Command
The cut command is used to extract specific columns from a file. The syntax for using cut is as follows:
$ cut -d delimiter -f field filename
Here, "delimiter" is the character that separates the columns in the file, "field" is the column we want to extract, and "filename" is the name of the file we want to extract the column from.
For example, if we have a file called "grades.csv" that contains the following data:
Name,Maths,Science,English
John,90,80,70
Jane,80,85,90
Bob,70,75,80
If we want to extract the "Science" column from the file, we can use the following command:
$ cut -d ',' -f 3 grades.csv
This will display the "Science" column on the terminal.
Exam Note:
The cut command is used to extract specific columns from a file.
wc Command
The wc command is used to count the number of lines, words, and characters in a file. The syntax for using wc is as follows:
$ wc filename
By default, wc displays the number of lines, words, and characters in the file. However, we can use the "-l", "-w", and "-c" options to display only the number of lines, words, or characters respectively.
For example, if we want to count the number of lines in a file called "fruits.txt", we can use the following command:
$ wc -l fruits.txt
This will display the number of lines in the file on the terminal.
Exam Note:
The wc command is used to count the number of lines, words, and characters in a file.
I/O Redirection
I/O redirection is a powerful technique in Linux that allows us to redirect the input and output of a command from and to a file or another command. This technique allows us to manipulate the flow of data in Linux and perform more complex operations.
Here are some examples of I/O redirection
Redirecting the output of a command to a file:
We can redirect the output of a command to a file using the ">" operator. For example, if we want to redirect the output of the "ls" command to a file called "files.txt", we can use the following command:
$ ls > files.txt
This will redirect the output of the "ls" command to the file "files.txt". If the file already exists, its contents will be overwritten.
Redirecting the input of a command from a file:
We can redirect the input of a command from a file using the "<" operator. For example, if we want to search for the word "apple" in a file called "search.txt", we can use the following command:
$ grep apple < search.txt
This will redirect the input of the "grep" command from the file "search.txt".
Using pipes to redirect the output of one command to the input of another command:
We can use pipes to redirect the output of one command to the input of another command. Pipes are represented by the "|" operator. For example, if we want to list the contents of the "/usr/bin" directory and then count the number of lines in the output, we can use the following command
$ ls /usr/bin | wc -l
This will list the contents of the "/usr/bin" directory and then count the number of lines in the output.
Regex continued here:
https://www.certificationmethods.com/2024/06/regex-searching.html