update 26-04-2023 This post is still valid but disconinued due to utilizing python for another application i’ve written called pygrep.

What is the reason for this?

After losing my memory with sed commands countless times, i decided to write something a little more easier for me to remember using the programming language awk/gawk.

Alternatives

Well, sed and grep are the two contenders, but the flexibility and remembering the complexity of both is what i’m trying to conquer here with this awk programme. I will be writing this in python as well, but for now this seems to work.

Grep has this command

  
grep -o 'search[^)]*)' file

This would search a keyword up to the first bracket, and only display this output, but if more instances of this occurs in the same line, all instances are displayed (not necessarily a bad thing though). Also, not 100% sure how grep alone could extend this out to the second bracket or second keyword without the help of a grep/sed combo script.

I know sed can do something like this, which would probably use loops and holding spaces no doubt, and i’ve probably read about sed a few dozen times doing this, but because the syntax of sed gets unreadable to me (after not using it for a while, and especially complex sed), I forget it… so i’m not going to go there in this blog post. Don’t get me wrong, I love sed, and if I used it often enough, i’d be all over it (obviously)

grawk

So i’ve tried to make this have some flexibility and on the command line it’ll read a little like this (grawk being the awk programme i’ve written, stored in my $PATH, otherwise call it like a regular script ./grawk…)….

grawk buzzword 1 ")" 2 file

This will search for the 1st buzzword found on a line, up to the 2nd bracket of file (or piped input).

Adding a $ to the commandline…

grawk buzzword 1 bash $ file

So it becomes a two keyword search, with an output starting from the buzzword, to end of line. There’s also a weird hacky bonus (which i did not add, so it must break something?) of adding a period to the buzzword…

grawk .buzzword 1 bash $ file

Which would print from the beginning of the line, of a two keyword search, and in this example, the $ prints to end of line, but if the $ was a 2, it would be a second instance of bash (or whatever character/word was there, for example..)

  
grawk .cron 1 ")" 2 /var/log/syslog | head -n1

Aug 17 21:30:01 jp-vivo CRON[16281]: (root) CMD ([ -x /etc/init.d/anacron ] && if [ ! -d /run/systemd/system ]; then /usr/sbin/invoke-rc.d anacron start >/dev/null; fi)

The above will print from start (using the period) to second bracket. I used head -n1 to suppress the output to the first line, as there’s a lot of cron jobs in syslog.

You can also search the 2nd, 3rd etc occurence of the buzzword..

grawk buzzword 3 $ file

Sometimes you might not want the last character so i added an exclude for the last character…

grawk jonny 1 : 6 /etc/passwd
output:
jonny:x:1000:1000:jonny,,,:/home/jonny:

grawk jonny 1 : 6 exc /etc/passwd
output:
jonny:x:1000:1000:jonny,,,:/home/jonny

I added an include one character (inc) opposite of the exc.

The Code

Can be found here..

grawk can be found here

Do as you wish with all the comments, i’ve added them to be helpful, but it won’t take long to understand this programme, it’s quite straight forward to use.

  
#!/usr/bin/gawk -f
#
# Usage:
# on the commandline
#
# Put grawk into your /usr/local/bin path, and call without ./
# grawk [start keyword search]  [start position Integar] [end of line character/word] [end position integar (or $ for end of line)] ["inc"/"exc" to include 1 or exclude 1 character] file
# 
# ./grawk CRON 1 ")" 2 /var/log/syslog 
# OUTPUT: This will search for the first instance CRON as a starting string up to the second instance of ")" from syslog
# 
# ./grawk CRON 1 $ /var/log/syslog 
# OUTPUT: Search for first instance of CRON to end of the line from syslog
#
# ./grawk root 1 "/" 1 exc /etc/passwd
# OUTPUT:
# root:x:0:0:root:
# root:x:524288:524288::
#
# ./grawk root 1 "/" 2 exc /etc/passwd
# root:x:0:0:root:/root:
# root:x:524288:524288::/nonexistent:
#
# ./grawk dns 1 "," 1 exc /etc/passwd
# OUTPUT:
# dnsmasq:x:112:65534:dnsmasq
# dnsmasq:x:132:141:Libvirt Dnsmasq
# 
# cat /etc/passwd | ./grawk root $
# OUTPUT:
# root:x:0:0:root:/root:/bin/bash
# root:/usr/sbin/nologin
# root:x:524288:524288::/nonexistent:/usr/bin/false

BEGIN{
	# start string search
	start = ARGV[1]
	delete ARGV[1]
	
	# does start have a number value option (flag) between 1-9, if so add a num1 counter to other ARGVs
	if ( ARGV[2] ~ "[1-9]{1}" ) {
		startappear = ARGV[2]
		num1 += 1
		delete ARGV[2]
		}
	
	# Last string pattern to extract up to, from the start string
	last = ARGV[2 + num1]
	
	# length of the last string to be used at the end to calculate string size
	len=length(last)
	delete ARGV[2 + num1]
	
	# instance number for end character/word 
	if (ARGV[3 + num1] ~ "[$1-9]{1}" ) {
		lastappear = ARGV[3 + num1]
		delete ARGV[3 + num1]
		num1 += 1
		}

	# Including an exc argument at the end of the command line will exclude the last character, unless last == "$"
	# If last == "$" the line printed will begin at start, finish at the end of the line.
	# If the last charactaer is a /, and $ is not included with exc, then a search of /home/user/ will turn into /home/user 
	# inc is the opposite of exc
	if (ARGV[3 + num1] == "exc") {
		len -= 1
		delete ARGV[3 + num1]
		}
	if (ARGV[3 + num1] == "inc") {
		len += 1
		delete ARGV[3 + num1]
		}
}
{
	# What this section does, is look for the start flag value, i.e. grawk root 2 $ /etc/passwd will use the second instance root is found in a line,
	# and with the $ flag, it'll print to the end of line.
	# grawk root 2 "/" 1 will look for the second instance of root in a line, and print up to the first "/"
	# grawk root 2 "/" 1  exc , will look for the second instance of root in a line, and print up to the first "/", excluding the last character "/"
	# There is another hacky thing you can do if you just want the whole line, and place a period before the start search and using the $
	# Note: == 1 is not included because the first instance is the default, but the 1 flag is still required
	
	for (m=2 ; m<=startappear ; m++) {
		$0 = gensub(start,"",1)
		}


	$0 ~ start && $0 ~ last && b[lines++]=$0 

	# delim with inrefrequent characters to help separate and reintroduce into final output.
	if (! /"¬"/ ) {
		delim="¬"
	} else if (! /"¶"/ ) {
		delim="¶"
	} else if (! /"¥"/ ) {
		delim="¥"
	}
}

# This below uses the data above to index and format the desired (hopefully) string output.
END{
	for (i in b) {
		if ( last == "$" || lastappear == "$") {
			n=index(b[i],start)
			z=substr(b[i],n)
			if (z != "") {
				print "\033[33m"z"\033[0m"    		
			}    		
		} else {
			n=index(b[i],start)
			t=substr(b[i],n)
			# This section needs to occur once the start of the string has been established, i.e. indexed and substr.
			if ( lastappear == 1 ) {f=index(t,start) ; c=index(t,last); z=substr(t,1,c+len-1) ; if (z != "") print "\033[33m"z"\033[0m" ; continue}
			g = gensub(last,delim,1,t)
			for (m=3 ; m<=lastappear ; m++) {
				g = gensub(last,delim,1,g)
				}
			c=index(g,last)
			z=substr(g,1,c+len-1)
			gsub(delim,last,z)
			if (z != "") {
				print "\033[33m"z"\033[0m"    		
			}
		}
	}
}

Improvements

Well, I would like to make this a bit more dynamic, but the more dynamic i make it, the more i’ll probably move it to python. But some improvements i’d like to add are…

Case insensitive
Potentially some regex
If an instance occurs more than once in a line, have an option to print all instances
More colours, with colour options
And i’m sure i’ll think of more…

Grawk - String Search

What is the reason for this?

Alternatives

grawk

The Code

Improvements

Further Reading

Welcome to my Homelab

Doom Emacs Cheatsheet

ZFS Mirror, Snapshots, Recovery