1

Analyze a line with sed using a regular expression

 3 years ago
source link: https://www.codesd.com/item/analyze-a-line-with-sed-using-a-regular-expression.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Analyze a line with sed using a regular expression

advertisements

Using sed I want to parse Heroku's log-runtime-metrics like this one:

2016-01-29T00:38:43.662697+00:00 heroku[worker.2]: source=worker.2 dyno=heroku.17664470.d3f28df1-e15f-3452-1234-5fd0e244d46f sample#memory_total=54.01MB sample#memory_rss=54.01MB sample#memory_cache=0.00MB sample#memory_swap=0.00MB sample#memory_pgpgin=17492pages sample#memory_pgpgout=3666pages

the desired output is:

worker.2: 54.01MB (54.01MB is being memory_total)

I could not manage although I tried several alternatives including:

sed -E 's/.+source=(.+) .+memory_total=(.+) .+/\1: \2/g'

What is wrong with my command? How can it be corrected?


I'd go for the old-fashioned, reliable, non-extended sed expressions and make sure that the patterns are not too greedy:

sed -e 's/.*source=\([^ ]*\) .*memory_total=\([^ ]*\) .*/\1: \2/'

The -e is not the opposite of -E, which is primarily a Mac OS X (BSD) sed option; the normal option for GNU sed is -r instead. The -e simply means that the next argument is an expression in the script.

This produces your desired output from the given line of data:

worker.2: 54.01MB


Bonus question: There are some odd lines within the stream, I can usually filter them out using a grep pipe like | grep memory_total. However if I try to use it along with the sed command, it does not work. No output is produced with this:

 heroku logs -t -s heroku | grep memory_total | sed.......

Sometimes grep | sed is necessary, but it is often redundant (unless you are using a grep feature that isn't readily supported by sed, such as Perl regular expressions).

You should be able to use:

sed -n -e '/memory_total=/ s/.*source=\([^ ]*\) .*memory_total=\([^ ]*\) .*/\1: \2/p'

The -n means "don't print by default". The /memory_total=/ matches the lines you're after; the s/// content is the same as before. I removed the g suffix that was there previously; the regex would never match multiple times anyway. I added the p to print the line when the substitution occurs.

Tags sed

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK