Hey everyone, and welcome back to our series, “Linux Basics For Hackers”! Grab your coffee (or chai, if that’s more your style ☕), get comfy, and chaliye shuru karte hain! 😂
In our last session, “Filtering Content Part 2″ we got our hands dirty with some seriously useful commands. We learned how to neatly arrange data with column, bring order to chaos with sort, make precise incisions with cut, transform characters with tr, and count just about anything with wc. Those tools are the bedrock of text manipulation, and I hope you’ve been playing around with them. They’re like the essential spices in your kitchen; you can cook without them, but why would you want to?
But today… oh, today we’re bringing out the big guns. We’re moving from simple spices to the master chef’s knives. We’re going to explore two of the most powerful and, let’s be honest, slightly intimidating commands in the Linux world: awk and sed.
If commands like cut and grep are like scalpels, allowing for precise but simple cuts, then awk and sed are like the Swiss Army knives of text processing. They can slice, dice, rearrange, and completely transform text in ways that will make your jaw drop. For anyone serious about cybersecurity, mastering these tools is non-negotiable. Why? Because logs, configuration files, and command outputs are the lifeblood of a security analyst. Being able to dissect this data quickly and efficiently is what separates the script kiddies from the real hackers.
So, buckle up! Chalo, this is where the real fun begins. And make sure you pay close attention, because everything we learn today is going to be crucial for the challenge I’m throwing at you in our next article!
Awk: The Grandmaster of Columns
Let’s start with awk. The name sounds a bit weird, right? It’s actually named after its creators: Alfred Aho, Peter Weinberger, and Brian Kernighan. It’s not just a command; it’s a full-fledged programming language designed for one thing: processing text, especially column-based or field-separated data.
Think about the /etc/passwd file. It’s a classic example. Each line has multiple pieces of information (username, user ID, home directory, etc.), all separated by a colon :. How would you extract just the username (the first field) and the default shell (the last field) for every user? You could try to wrestle with cut, but awk makes it ridiculously easy.
The basic syntax for awk looks like this: awk 'pattern { action }' filename
Let’s break it down:
- pattern: This is a condition.- awkscans the file line by line. If a line matches the pattern,- awkperforms the action on it. If you omit the pattern,- awkperforms the action on every single line.
- { action }: This is what you want- awkto do when it finds a matching line. The action is enclosed in curly braces- {}. The most common action is- print.
Fields and Delimiters
The real magic of awk is how it sees each line. By default, it automatically splits a line into fields based on spaces or tabs. You can refer to these fields using a dollar sign $ followed by the field number.
- $1is the first field.
- $2is the second field.
- $0represents the entire line.
Let’s see a simple example. Suppose we have a file named hackers.txt with the following content:
neo the_one trinity matrix
morpheus captain nebuchadnezzar
agent_smith antagonist matrix
If we want to print just the first and second fields (the hacker’s name and their title), we can do this:
awk '{print $1, $2}' hackers.txt
Output:
neo the_one
morpheus captain
agent_smith antagonist
See how it just grabs the columns we asked for? Kitna aasan hai! But what about files that don’t use spaces as separators, like our /etc/passwd file? That’s where the -F flag comes in. It lets us specify a custom Field separator (or delimiter).
Let’s tackle that challenge from earlier: printing the username ($1) and the shell ($7) from /etc/passwd.
awk -F':' '{print "Username:", $1, " | Shell:", $7}' /etc/passwd
Partial Output:
Username: root  | Shell: /bin/bash
Username: daemon  | Shell: /usr/sbin/nologin
Username: bin  | Shell: /usr/sbin/nologin
Username: sys  | Shell: /usr/sbin/nologin
...
Look at that! Bas, we used -F':' to tell awk that the fields are separated by colons. In the print action, we even added our own text strings (“Username:”, ” | Shell:”) to make the output more readable. This is something cut just can’t do.
Using Patterns to Filter
Now, let’s bring patterns into the mix. A pattern is usually a regular expression enclosed in forward slashes /. awk will only perform the action on lines that match the regex.
Let’s say we only want to see the users on our system who have /bin/bash as their shell.
awk -F':' '/\/bin\/bash/ {print $1}' /etc/passwd
Here, the pattern is /\/bin\/bash/. We’re telling awk: “Hey, only look at lines that contain the string /bin/bash, and for those lines, just print the first field (the username).” We have to escape the forward slashes \ inside the regex so awk doesn’t get confused.
Built-in Variables: NF and NR
awk also has some handy built-in variables. Two of the most common are:
- NF: Number of Fields. This variable holds the total number of fields in the current line. This is great for when you want to print the last field but don’t know how many fields there are. You can just use- $NF.
- NR: Number of Records. This is the line number of the current line being processed.
Let’s try getting the username and the last field (the shell) from /etc/passwd again, but this time using $NF.
awk -F':' '{print $1, $NF}' /etc/passwd
This is often more robust than hardcoding $7, just in case the file format ever changes.
More Than Just Lines: BEGIN and END Blocks
This is where awk starts to feel less like a command and more like a proper programming tool. awk has two special patterns, BEGIN and END.
- BEGIN { ... }: The action inside this block is executed before- awkstarts reading any lines from the input file. It’s perfect for printing headers, initializing variables, or setting things up.
- END { ... }: The action inside this block is executed after- awkhas finished reading all the lines. It’s ideal for printing summaries, totals, or final calculations.
Let’s combine everything we’ve learned to create a mini-report. We’ll count the number of users with a /bin/bash shell and print a nice summary.
awk -F':' 'BEGIN { 
    print "--- Bash User Report ---"; 
    count=0 
} 
/\/bin\/bash/ { 
    count++ 
} 
END { 
    print "Total bash users found:", count 
}' /etc/passwd
Output:
--- Bash User Report ---
Total bash users found: 3
(Your number might be different, of course!) See what we did there? We set up a header and a count variable in the BEGIN block. Then, for every line that matched our pattern, we incremented the counter. Finally, the END block printed the final tally. Yeh cheez! (That’s the stuff!)
Sed: The Stream Editor
Alright, let’s switch gears to sed, the Stream Editor. If awk is the master of extracting and formatting data, sed is the master of modifying it. It’s designed to find and replace text, delete lines, and perform other edits on a stream of text—either from a file or from a pipe—without opening a text editor. It’s automation heaven.
The most common use for sed is substitution. The syntax is a classic: sed 's/find/replace/flags' filename
- s: This tells- sedwe want to perform a substitution.
- find: The text or regular expression you want to find.
- replace: The text you want to replace it with.
- flags: These modify the command’s behaviour. The most important one is- gfor global, which means it will replace all occurrences on a line, not just the first one.
Find and Replace on Steroids
Let’s go back to our /etc/passwd file. Suppose we want to create a report, but for security reasons, we want to replace all instances of /bin/bash with /bin/nologin_shell.
sed 's/\/bin\/bash/\/bin\/nologin_shell/g' /etc/passwd
Just like with awk, we have to escape the / characters. This can get messy. A cool sed trick is that you can use a different character as a delimiter. The first character after the s becomes the new delimiter. Let’s rewrite that command using a pipe | as the delimiter, which is much cleaner:
sed 's|/bin/bash|/bin/nologin_shell|g' /etc/passwd
Much better, hai na? The output will be printed to your terminal. sed does not modify the original file by default. This is a critical safety feature. It just prints the modified stream to standard output.
In-Place Editing (Handle with Care!)
What if you do want to modify the file? You can use the -i flag for in-place editing. WARNING: This is a destructive operation. Once you run it, the original file is overwritten. There is no undo. I repeat, THERE IS NO UNDO. My advice? Always, always, always create a backup first.
A safer way to use the -i flag is to provide a backup extension.
sed -i.bak 's|/bin/bash|/bin/nologin_shell|g' /etc/passwd
This command will still modify /etc/passwd, but it will also create a backup of the original file named /etc/passwd.bak. If you mess up, you can just restore from the backup. Phew!
Deleting Lines (d)
sed isn’t just for substitution. It can also delete lines. The d command is used for this. You give it an address (a line number or a pattern to match), and it will delete any lines that match.
Want to delete the line containing the user “sys”?
sed '/sys/d' /etc/passwd
This will print the entire file except for the line containing “sys”. You can also specify a range of lines to delete, like sed '5,10d' /etc/passwd.
More sed Tricks: Referencing and Inserting
Here are two more tricks to make your sed game even stronger.
- Referencing the match (&): In thereplacepart of your substitute command, you can use an ampersand&to represent the entire string that was matched by thefindpart. This is amazing for wrapping text. For example, let’s find every usernamerootand wrap it in asterisks.sed 's/root/**&**/g' /etc/passwdThis would changeroot:x:0:0:root:/root:/bin/bashto**root**:x:0:0:**root**:/root:/bin/bash.
- Inserting Lines (ianda): You can insert (i) or append (a) entire lines of text. This is useful for adding comments or configuration directives to files automatically.# Insert a line BEFORE line 3 sed '3i # This is a new comment' /etc/passwd # Append a line AFTER the line containing "root" sed '/root/a # Root user settings above' /etc/passwd
The Hacker’s Edge: Combining the Tools
Okay, we’ve learned about awk and sed. Par tension mat lo, the real power comes when you start combining them with each other and with the tools we learned about last time. This is what we call building a “pipeline.”
Let’s imagine a scenario. You’re analyzing an auth.log file on a server to look for failed password attempts. A typical failed login line might look something like this: Sep 29 07:15:33 server sshd[12345]: Failed password for invalid user bob from 192.168.1.101 port 22 ssh2
Your goal is to get a unique, sorted list of all the IP addresses that have tried to log in and failed. How would you do it? Let’s build a pipeline!
- First, find the right lines: We can use grep(orawk!) to find only the lines with “Failed password”.
- Next, extract the IP address: This looks like a job for awk! Let’s count… 1, 2, 3… it looks like it’s the 11th field.
- Then, get a unique, sorted list: We learned this last time! We can pipe the output to sortand then touniq.
Let’s put it all together:
grep "Failed password" /var/log/auth.log | awk '{print $11}' | sort | uniq
Boom! Ho gaya! 🤯 With a single line, we’ve processed a potentially huge log file and distilled it down to exactly the information we need. This is the essence of working on the command line. This is the power you need as a hacker.
awk and sed are deep, deep rabbit holes. We’ve only scratched the surface today. They have variables, loops, and conditional logic. They are entire programming languages. But with the fundamentals we’ve covered, you can already accomplish about 80% of the text-processing tasks you’ll ever need to do.
Now it’s your turn. Chalo bhai, open up a terminal and start playing. Use awk to parse /etc/passwd. Use sed to replace words in a text file. Break things (just make sure you have backups!). The only way to get comfortable is to build that muscle memory.
And get ready, because in the next article, we’re having a “Filtering Content Challenge.” I’ll give you a messy data file and a goal, and you’ll have to use everything we’ve learned so far—grep, sort, cut, uniq, awk, and sed—to solve it.
FAQs
1. What is the main difference between awk and sed? The simplest way to think about it is that sed is primarily for modifying text (substituting, deleting, inserting), while awk is for parsing and extracting data from text. awk is field-aware, making it perfect for structured, column-based data. sed operates on the entire line as a stream of characters. While their capabilities can overlap, you’d typically grab sed for a find-and-replace job and awk to pull out the 3rd and 5th columns of a file.
2. Can awk modify files in-place like sed -i? Yes, but it’s a bit different. Starting with GNU Awk version 4.1.0, an “in-place” editing extension was introduced. The syntax looks like this: gawk -i inplace '{...}' filename. However, it’s not as universally available as sed -i, and the traditional, safer approach is to redirect the output to a temporary file and then move it back, like this: awk '{...}' oldfile > newfile && mv newfile oldfile.
3. Why are these old command-line tools still relevant for modern cybersecurity? Because the data formats haven’t changed! Log files, CSVs, configuration files, and command-line tool outputs are still text-based. These tools are lightning-fast, universally available on almost every Linux system (and macOS, and even Windows with WSL), and use very few resources. In a security incident, you need to analyze data quickly on a potentially compromised or resource-constrained system. You can’t always rely on a fancy graphical interface. Mastering the command line is a timeless and essential skill.
4. Are there any modern alternatives to awk and sed? Yes, scripting languages like Python and Perl are incredibly powerful for text processing and are often used for more complex tasks. Python, with its extensive libraries like Pandas (for structured data), is a fantastic tool to have in your arsenal. However, for quick, on-the-fly analysis directly in the terminal, nothing beats the speed and convenience of whipping up a quick awk or sed one-liner. They solve different scales of the same problem.
 

 
                 
                