Learn Now : Filtering Content in Linux with Awk and Sed

This part is the 17 of 17 in the series Linux Basics For Hackers
Series Navigation<< Sorting, Slicing and More… in Linux | Filtering Content Part 2

Hey everyone, and welcome back to our series, “Linux Basics For Hackers”! Grab your coffee (or chai, if that’s more your style ☕), get comfy, and chaliye shuru karte hain! 😂

In our last session, “Filtering Content Part 2″ we got our hands dirty with some seriously useful commands. We learned how to neatly arrange data with column, bring order to chaos with sort, make precise incisions with cut, transform characters with tr, and count just about anything with wc. Those tools are the bedrock of text manipulation, and I hope you’ve been playing around with them. They’re like the essential spices in your kitchen; you can cook without them, but why would you want to?

But today… oh, today we’re bringing out the big guns. We’re moving from simple spices to the master chef’s knives. We’re going to explore two of the most powerful and, let’s be honest, slightly intimidating commands in the Linux world: awk and sed.

If commands like cut and grep are like scalpels, allowing for precise but simple cuts, then awk and sed are like the Swiss Army knives of text processing. They can slice, dice, rearrange, and completely transform text in ways that will make your jaw drop. For anyone serious about cybersecurity, mastering these tools is non-negotiable. Why? Because logs, configuration files, and command outputs are the lifeblood of a security analyst. Being able to dissect this data quickly and efficiently is what separates the script kiddies from the real hackers.

So, buckle up! Chalo, this is where the real fun begins. And make sure you pay close attention, because everything we learn today is going to be crucial for the challenge I’m throwing at you in our next article!




Awk: The Grandmaster of Columns

Let’s start with awk. The name sounds a bit weird, right? It’s actually named after its creators: Alfred Aho, Peter Weinberger, and Brian Kernighan. It’s not just a command; it’s a full-fledged programming language designed for one thing: processing text, especially column-based or field-separated data.

Think about the /etc/passwd file. It’s a classic example. Each line has multiple pieces of information (username, user ID, home directory, etc.), all separated by a colon :. How would you extract just the username (the first field) and the default shell (the last field) for every user? You could try to wrestle with cut, but awk makes it ridiculously easy.

The basic syntax for awk looks like this: awk 'pattern { action }' filename

Let’s break it down:

  • pattern: This is a condition. awk scans the file line by line. If a line matches the pattern, awk performs the action on it. If you omit the pattern, awk performs the action on every single line.
  • { action }: This is what you want awk to do when it finds a matching line. The action is enclosed in curly braces {}. The most common action is print.

Fields and Delimiters

The real magic of awk is how it sees each line. By default, it automatically splits a line into fields based on spaces or tabs. You can refer to these fields using a dollar sign $ followed by the field number.

  • $1 is the first field.
  • $2 is the second field.
  • $0 represents the entire line.

Let’s see a simple example. Suppose we have a file named hackers.txt with the following content:

neo the_one trinity matrix
morpheus captain nebuchadnezzar
agent_smith antagonist matrix

If we want to print just the first and second fields (the hacker’s name and their title), we can do this:

awk '{print $1, $2}' hackers.txt

Output:

neo the_one
morpheus captain
agent_smith antagonist

See how it just grabs the columns we asked for? Kitna aasan hai! But what about files that don’t use spaces as separators, like our /etc/passwd file? That’s where the -F flag comes in. It lets us specify a custom Field separator (or delimiter).

Let’s tackle that challenge from earlier: printing the username ($1) and the shell ($7) from /etc/passwd.

awk -F':' '{print "Username:", $1, " | Shell:", $7}' /etc/passwd

Partial Output:

Username: root  | Shell: /bin/bash
Username: daemon  | Shell: /usr/sbin/nologin
Username: bin  | Shell: /usr/sbin/nologin
Username: sys  | Shell: /usr/sbin/nologin
...

Look at that! Bas, we used -F':' to tell awk that the fields are separated by colons. In the print action, we even added our own text strings (“Username:”, ” | Shell:”) to make the output more readable. This is something cut just can’t do.

Using Patterns to Filter

Now, let’s bring patterns into the mix. A pattern is usually a regular expression enclosed in forward slashes /. awk will only perform the action on lines that match the regex.

Let’s say we only want to see the users on our system who have /bin/bash as their shell.

awk -F':' '/\/bin\/bash/ {print $1}' /etc/passwd

Here, the pattern is /\/bin\/bash/. We’re telling awk: “Hey, only look at lines that contain the string /bin/bash, and for those lines, just print the first field (the username).” We have to escape the forward slashes \ inside the regex so awk doesn’t get confused.

Built-in Variables: NF and NR

awk also has some handy built-in variables. Two of the most common are:

  • NF: Number of Fields. This variable holds the total number of fields in the current line. This is great for when you want to print the last field but don’t know how many fields there are. You can just use $NF.
  • NR: Number of Records. This is the line number of the current line being processed.

Let’s try getting the username and the last field (the shell) from /etc/passwd again, but this time using $NF.

awk -F':' '{print $1, $NF}' /etc/passwd

This is often more robust than hardcoding $7, just in case the file format ever changes.

More Than Just Lines: BEGIN and END Blocks

This is where awk starts to feel less like a command and more like a proper programming tool. awk has two special patterns, BEGIN and END.

  • BEGIN { ... }: The action inside this block is executed before awk starts reading any lines from the input file. It’s perfect for printing headers, initializing variables, or setting things up.
  • END { ... }: The action inside this block is executed after awk has finished reading all the lines. It’s ideal for printing summaries, totals, or final calculations.

Let’s combine everything we’ve learned to create a mini-report. We’ll count the number of users with a /bin/bash shell and print a nice summary.

awk -F':' 'BEGIN { 
    print "--- Bash User Report ---"; 
    count=0 
} 
/\/bin\/bash/ { 
    count++ 
} 
END { 
    print "Total bash users found:", count 
}' /etc/passwd

Output:

--- Bash User Report ---
Total bash users found: 3

(Your number might be different, of course!) See what we did there? We set up a header and a count variable in the BEGIN block. Then, for every line that matched our pattern, we incremented the counter. Finally, the END block printed the final tally. Yeh cheez! (That’s the stuff!)

Sed: The Stream Editor

Alright, let’s switch gears to sed, the Stream Editor. If awk is the master of extracting and formatting data, sed is the master of modifying it. It’s designed to find and replace text, delete lines, and perform other edits on a stream of text—either from a file or from a pipe—without opening a text editor. It’s automation heaven.

The most common use for sed is substitution. The syntax is a classic: sed 's/find/replace/flags' filename

  • s: This tells sed we want to perform a substitution.
  • find: The text or regular expression you want to find.
  • replace: The text you want to replace it with.
  • flags: These modify the command’s behaviour. The most important one is g for global, which means it will replace all occurrences on a line, not just the first one.

Find and Replace on Steroids

Let’s go back to our /etc/passwd file. Suppose we want to create a report, but for security reasons, we want to replace all instances of /bin/bash with /bin/nologin_shell.

sed 's/\/bin\/bash/\/bin\/nologin_shell/g' /etc/passwd

Just like with awk, we have to escape the / characters. This can get messy. A cool sed trick is that you can use a different character as a delimiter. The first character after the s becomes the new delimiter. Let’s rewrite that command using a pipe | as the delimiter, which is much cleaner:

sed 's|/bin/bash|/bin/nologin_shell|g' /etc/passwd

Much better, hai na? The output will be printed to your terminal. sed does not modify the original file by default. This is a critical safety feature. It just prints the modified stream to standard output.

In-Place Editing (Handle with Care!)

What if you do want to modify the file? You can use the -i flag for in-place editing. WARNING: This is a destructive operation. Once you run it, the original file is overwritten. There is no undo. I repeat, THERE IS NO UNDO. My advice? Always, always, always create a backup first.

A safer way to use the -i flag is to provide a backup extension.

sed -i.bak 's|/bin/bash|/bin/nologin_shell|g' /etc/passwd

This command will still modify /etc/passwd, but it will also create a backup of the original file named /etc/passwd.bak. If you mess up, you can just restore from the backup. Phew!

Deleting Lines (d)

sed isn’t just for substitution. It can also delete lines. The d command is used for this. You give it an address (a line number or a pattern to match), and it will delete any lines that match.

Want to delete the line containing the user “sys”?

sed '/sys/d' /etc/passwd

This will print the entire file except for the line containing “sys”. You can also specify a range of lines to delete, like sed '5,10d' /etc/passwd.

More sed Tricks: Referencing and Inserting

Here are two more tricks to make your sed game even stronger.

  1. Referencing the match (&): In the replace part of your substitute command, you can use an ampersand & to represent the entire string that was matched by the find part. This is amazing for wrapping text. For example, let’s find every username root and wrap it in asterisks.sed 's/root/**&**/g' /etc/passwd
    This would change root:x:0:0:root:/root:/bin/bash to **root**:x:0:0:**root**:/root:/bin/bash.
  2. Inserting Lines (i and a): You can insert (i) or append (a) entire lines of text. This is useful for adding comments or configuration directives to files automatically.# Insert a line BEFORE line 3 sed '3i # This is a new comment' /etc/passwd # Append a line AFTER the line containing "root" sed '/root/a # Root user settings above' /etc/passwd

The Hacker’s Edge: Combining the Tools

Okay, we’ve learned about awk and sed. Par tension mat lo, the real power comes when you start combining them with each other and with the tools we learned about last time. This is what we call building a “pipeline.”

Let’s imagine a scenario. You’re analyzing an auth.log file on a server to look for failed password attempts. A typical failed login line might look something like this: Sep 29 07:15:33 server sshd[12345]: Failed password for invalid user bob from 192.168.1.101 port 22 ssh2

Your goal is to get a unique, sorted list of all the IP addresses that have tried to log in and failed. How would you do it? Let’s build a pipeline!

  1. First, find the right lines: We can use grep (or awk!) to find only the lines with “Failed password”.
  2. Next, extract the IP address: This looks like a job for awk! Let’s count… 1, 2, 3… it looks like it’s the 11th field.
  3. Then, get a unique, sorted list: We learned this last time! We can pipe the output to sort and then to uniq.

Let’s put it all together:

grep "Failed password" /var/log/auth.log | awk '{print $11}' | sort | uniq

Boom! Ho gaya! 🤯 With a single line, we’ve processed a potentially huge log file and distilled it down to exactly the information we need. This is the essence of working on the command line. This is the power you need as a hacker.

awk and sed are deep, deep rabbit holes. We’ve only scratched the surface today. They have variables, loops, and conditional logic. They are entire programming languages. But with the fundamentals we’ve covered, you can already accomplish about 80% of the text-processing tasks you’ll ever need to do.

Now it’s your turn. Chalo bhai, open up a terminal and start playing. Use awk to parse /etc/passwd. Use sed to replace words in a text file. Break things (just make sure you have backups!). The only way to get comfortable is to build that muscle memory.

And get ready, because in the next article, we’re having a “Filtering Content Challenge.” I’ll give you a messy data file and a goal, and you’ll have to use everything we’ve learned so far—grep, sort, cut, uniq, awk, and sed—to solve it.

FAQs

1. What is the main difference between awk and sed? The simplest way to think about it is that sed is primarily for modifying text (substituting, deleting, inserting), while awk is for parsing and extracting data from text. awk is field-aware, making it perfect for structured, column-based data. sed operates on the entire line as a stream of characters. While their capabilities can overlap, you’d typically grab sed for a find-and-replace job and awk to pull out the 3rd and 5th columns of a file.

2. Can awk modify files in-place like sed -i? Yes, but it’s a bit different. Starting with GNU Awk version 4.1.0, an “in-place” editing extension was introduced. The syntax looks like this: gawk -i inplace '{...}' filename. However, it’s not as universally available as sed -i, and the traditional, safer approach is to redirect the output to a temporary file and then move it back, like this: awk '{...}' oldfile > newfile && mv newfile oldfile.

3. Why are these old command-line tools still relevant for modern cybersecurity? Because the data formats haven’t changed! Log files, CSVs, configuration files, and command-line tool outputs are still text-based. These tools are lightning-fast, universally available on almost every Linux system (and macOS, and even Windows with WSL), and use very few resources. In a security incident, you need to analyze data quickly on a potentially compromised or resource-constrained system. You can’t always rely on a fancy graphical interface. Mastering the command line is a timeless and essential skill.

4. Are there any modern alternatives to awk and sed? Yes, scripting languages like Python and Perl are incredibly powerful for text processing and are often used for more complex tasks. Python, with its extensive libraries like Pandas (for structured data), is a fantastic tool to have in your arsenal. However, for quick, on-the-fly analysis directly in the terminal, nothing beats the speed and convenience of whipping up a quick awk or sed one-liner. They solve different scales of the same problem.

Leave a Reply