Seriously, did you try using tail -f
to watch a log file in real-time? It feels like you’ve suddenly got superpowers, doesn’t it?
In Pagers and More in Linux | Filtering Content Part 1, we learned how to tame the flood of information in the terminal. We took that chaotic rush of text and learned to view it calmly with less
, peek at the start with head
, and check the latest updates with tail
. We essentially learned how to read the data without going crazy.
But a hacker’s job isn’t just to read; it’s to find the needle in the haystack. It’s about spotting the one weak entry in a list of thousands, the one anomalous username in a log file. Just viewing the data isn’t enough. We need to dissect it, rearrange it, and carve out exactly what we need.
That’s what this article is all about. We’re moving from being passive observers to active surgeons of data. Today, we’re diving deeper into the art of filtering content in Linux by learning how to manipulate the output itself. We’ll cover five incredibly powerful commands: sort
, cut
, tr
, wc
, and column
.
Think of it like this: in Part 1, we got the book and learned how to turn the pages. Now, we’re going to get out our highlighters, scissors, and a calculator to really analyze the contents.
So, top up your coffee (or chai, whatever fuels your hacking sessions ☕️), and let’s get our hands dirty.
Bringing Order to Chaos with sort
Let’s start with a very common problem. You have a list of things—usernames, files, IP addresses—and they’re all jumbled up. Trying to find anything specific is a pain. The sort
command is our elegant solution.
By default, sort
takes text input and reorders it alphabetically.
Let’s create a small file to play with. You can use a text editor like nano
or just use this quick echo
command:
echo -e "kali\nparrot\narch\nfedora\nubuntu\ndebian" > distros.txt
Now, if we cat
this file, it will show the distros in the order we entered them. But what if we pipe it to sort
?
cat distros.txt | sort
Look at that. A perfectly ordered, alphabetical list:
arch
debian
fedora
kali
parrot
ubuntu
Simple, clean, and incredibly effective.
But sort
has a few more tricks up its sleeve.
- Reverse Order (
-r
): What if you want it in descending order? Just add the-r
flag for reverse.cat distros.txt | sort -r
- Numeric Sorting (
-n
): This is a super important one. By default,sort
treats everything as text. So, if you sort the numbers 1, 10, and 2, it will order them as 1, 10, 2 (because “10” comes before “2” alphabetically). That’s usually not what we want. The-n
flag tellssort
to interpret the values as numbers.# Create a file with numbers echo -e "10\n2\n1\n100\n5" > numbers.txt # Incorrect alphabetical sort sort numbers.txt # Correct numeric sort sort -n numbers.txt
See the difference? For any kind of data analysis,-n
is essential. - Unique Lines (
-u
): Often, your data will have duplicate entries. The-u
flag is a lifesaver; it tellssort
to only show one instance of each line.echo -e "kali\nparrot\nparrot\nkali\nkali" > duplicates.txt sort -u duplicates.txt
This command first sorts the list and then removes any duplicates it finds. It’s a quick and dirty way to get a unique list of items.
Imagine you’ve extracted a list of usernames from a log file. Using sort -u
instantly gives you a clean list of every unique user who has accessed the system. Powerful stuff.
One more powerful use case is sorting by a specific column with the -k
flag. Think about the output of ls -l
. If you want to find the largest files, you need to sort by the 5th column (file size).
# -k5 specifies the 5th column
# -n means sort numerically
# -r reverses the result to show biggest first
ls -l /etc/ | tr -s ' ' | sort -k5 -n -r | head -n 5
This command chain is a perfect example of what we’re learning: it lists the contents of /etc/
, squeezes the spaces, sorts the result by the 5th column (size) in reverse numerical order, and then shows you just the top 5 largest files. This is a daily-driver command for any sysadmin.
Making the cut
: Extracting Specific Data
Okay, so our data is sorted. But what if it’s in columns? Think about the output of ls -l
. You have permissions, owner, group, file size, date, and filename all in one line. Boom! You’ve just extracted a clean list of all users on the system. How cool is that?
Let’s take a real-world security example. Imagine you’re looking at a web server’s access log. A typical line might look like this: 192.168.1.10 - - [05/Sep/2025:10:30:01 +0000] "GET /login.php HTTP/1.1" 200 1482
What if you want a list of all IP addresses that have accessed your server? The IP address is the first field, and the delimiter is a space.
# Assuming your log is in access.log
# -d' ' specifies a space delimiter
# -f1 gets the first field (the IP)
cut -d' ' -f1 access.log | sort -u
In one line, you’ve just extracted every unique IP address from potentially thousands of log entries. You can now use this list to check for malicious IPs or see who your most frequent visitors are.
You can select multiple fields too:
-f1,3
will get the 1st and 3rd fields (username and user ID).-f1-3
will get a range: the 1st, 2nd, and 3rd fields.
What about commands that use spaces as delimiters, like ls -l
? Well, it gets a bit tricky because ls -l
can have a variable number of spaces. This is where other tools, which we’ll see later, can be better. But for consistently delimited files (like CSVs or the /etc/passwd
file), cut
is your go-to tool. It’s fast, simple, and does one job perfectly.
Transforming Characters with tr
Next up is tr
, which stands for “translate.” This command is a bit different. It doesn’t really care about lines or fields; it cares about individual characters. You can use it to replace or delete specific characters in a stream of text.
The basic syntax is tr [characters-to-find] [characters-to-replace-with]
.
- Changing Case: A classic use is to convert text from lowercase to uppercase or vice versa.
echo "Hello World" | tr 'a-z' 'A-Z' # Output: HELLO WORLD
- Replacing Characters: Maybe you have some data separated by hyphens, but you need it to be separated by spaces. Easy.
echo "2025-09-05" | tr '-' ' ' # Output: 2025 09 05
This can be really useful for reformatting data to be fed into another command that expects a different delimiter. - Deleting Characters (
-d
): This is where it gets interesting for hacking. Sometimes you have weird characters in your text that are messing things up. You can usetr
with the-d
flag to delete them.# Remove all the letter 'o's echo "Hello World" | tr -d 'o' # Output: Hell Wrld
- Squeezing Characters (
-s
): The-s
flag is for “squeezing” repeating characters. If you have multiple spaces in a row, this will condense them down to a single space. This is a fantastic way to clean up messy command output before usingcut
.echo "this has too many spaces" | tr -s ' ' # Output: this has too many spaces
Remember howls -l
had a variable number of spaces? If you pipe its output throughtr -s ' '
first, you get clean, single-space-delimited columns, perfect forcut
!ls -l | tr -s ' ' | cut -d' ' -f9 # This will now reliably give you the filenames
tr
is like a find-and-replace for your terminal, but on a character-by-character level.
For a pentester, tr
is fantastic for cleaning up wordlists. Many password-cracking tools work best with clean lists. You can strip out all punctuation and numbers from a file like this:
# -d deletes characters
# [:punct:] and [:digit:] are character classes
cat some_messy_wordlist.txt | tr -d '[:punct:][:digit:]' > clean_wordlist.txt
This takes a messy list and outputs a clean one containing only letters, which can be much more effective for certain attacks.
Counting Things with wc
Ever needed a quick headcount? wc
(word count) is the command for the job. It can count lines, words, and characters in a file.
Run it on our distros.txt
file from earlier:
wc distros.txt
The output will be something like: 6 6 41 distros.txt
. This means: 6 lines, 6 words, and 41 characters.
Most of the time, you only care about one of those things. So, you’ll use a flag:
-l
: Count lines. This is probably the one you’ll use the most.-w
: Count words.-c
: Count bytes (characters).
Let’s find out how many users are on our system. We already know how to get the list of usernames with cut
. Now we just pipe that list into wc -l
.
cut -d':' -f1 /etc/passwd | wc -l
This is useful, but in security, you’ll often use wc
as a counter after a search. For instance, how many times has a specific IP address, say 182.74.201.2
, tried to access your server? We can use grep
(which we’ll cover later) to find the lines and wc
to count them.
# Search for the IP in the log file, then count the matching lines
grep '182.74.201.2' access.log | wc -l
This instantly tells you the frequency of an event, which is fundamental to log analysis and incident response. If you see that number suddenly spike, you might be under attack.
This command chain gives you a single number: the total number of users registered in the file. This is the essence of the Linux command line philosophy: small tools, each doing one job well, chained together to achieve a complex result.
Making Output Pretty with column
One last command for today, and it’s a handy one for readability. Presentation matters, right? Sometimes, you’ll extract data and it’s all jumbled and misaligned, making it hard to read. The column
command is your personal data stylist; it takes messy, space-separated text and formats it into beautiful, clean, newspaper-style columns.
The most common way to use it is with the -t
flag, which tells it to create a table.
Let’s revisit our /etc/passwd
example. We know it’s separated by colons. If we use tr
to replace the colons with spaces, the output is still a bit messy.
# Replacing colons with spaces gives us the right data, but poor alignment
cat /etc/passwd | tr ':' ' '
Now, let’s pipe that same output into column -t
.
cat /etc/passwd | tr ':' ' ' | column -t
Look at that difference! The output is now a perfectly formatted table. Everything is aligned, making it incredibly easy to read and understand.
Handling Different Delimiters
But what if you don’t want to use tr
or sed
first? The column
command is smart enough to handle different delimiters on its own using the -s
flag. You can tell it what the separator character is, and it will build the table accordingly.
Let’s try our /etc/passwd
example again, but this time, we’ll do it in a single, more efficient step.
# -s specifies the separator, -t creates the table
column -s':' -t /etc/passwd
This gives you the same beautiful output but with a cleaner, more direct command. This is fantastic for quickly viewing CSV (Comma-Separated Values) files or any other consistently delimited file.
For example, if you had a file data.csv
that looked like this: Name,Role,ID
Alice,Admin,101
Bob,User,102
You could view it as a proper table with:
column -s',' -t data.csv
Another great use case is for cleaning up the output of system commands. For example, the mount
command shows you all the mounted filesystems, but its default output is a bit jumbled. Pipe it to column -t
and it’s a different story.
mount | column -t
Suddenly, the information is perfectly aligned and easy to scan, allowing you to quickly check mount points and options. It’s a simple trick that makes your life on the command line much more pleasant.
This is fantastic for making sense of complex, delimited data on the fly without needing to open a spreadsheet application. It’s the final touch that makes your terminal output look professional.
What’s on the Horizon?
And there you have it! We’ve just added five more surgical instruments to our command-line toolkit. You can now sort data, cut out the pieces you need, transform characters, count the results, and format it all beautifully.
Practice combining these with the commands from Part 1. For example: ls -l /usr/bin | sort -k5 -n -r | head -n 10
Can you figure out what that command does? (It lists the 10 largest files in /usr/bin
).
We’re getting really close to being able to perform some seriously advanced data-fu. In the next and final part of our “Filtering Content” mini-series, we’re bringing out the heavy artillery: sed
and awk
. These are the ultimate text-processing powerhouses, allowing you to edit streams of text and perform complex actions based on patterns.
Stay tuned, keep practicing, and as always, happy hacking!
FAQs
1. When should I use cut
instead of other tools like awk
? cut
is best for simple, fixed-column data where the delimiter is consistent (like a colon or a comma). It’s extremely fast and straightforward. For more complex tasks, like when columns are separated by a variable number of spaces or you need to perform actions on the data (not just extract it), awk
is the more powerful choice.
2. How can I sort a file based on a specific column? You can use the -k
flag with sort
. For example, sort -k3 -n filename
will sort the file numerically (-n
) based on the contents of the third column (-k3
). This is incredibly useful for sorting the output of commands like ls -l
by file size.
3. What’s a practical cybersecurity use case for tr
? A common use case is data sanitization. Imagine you’re analyzing a file that has a mix of uppercase and lowercase letters, but you need everything to be consistent for searching. You can pipe the file through tr 'A-Z' 'a-z'
to make everything lowercase before further processing. It’s also used to remove non-printable or “bad” characters from data that might otherwise break other scripts or tools.
4. How can wc -l
be used in scripting? In shell scripting, wc -l
is frequently used to check if a command produced any output. For example, you might run a grep
command to search for an error in a log file, pipe the result to wc -l
, and if the count is greater than zero, you know an error was found and can trigger an alert.
5. Is the column
command available on all Linux systems? The column
command is part of the util-linux
package and is available on the vast majority of modern Linux distributions. However, you might encounter very old or minimalistic systems (like some embedded devices) where it might not be installed by default.