

Advanced Ruby gsub with regular expressions | Steve Fenton
source link: https://www.stevefenton.co.uk/2022/08/advanced-ruby-gsub-with-regular-expressions/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Advanced Ruby gsub with regular expressions
This post is really about the Ruby language gsub
string method. It does contain a tiny bit of Jekyll hooks, but they are important to me and perhaps not to you. If you just want to know how to extract a match in gsub
and use it in the output, scroll down to the bottom for the “final revelation”.
Let’s set the scene for the problem. I’m processing a custom Markdown block on a Jekyll site during a hook that fires after conversion, but before the result is written to disk.
The custom Markdown block looks like this:
:::hint Some content to be shown in a hint box. :::
By the time the text is handed to me, it’s already been mostly processed by Jekyll’s Markdown parser, so what we’re dealing with is something like:
<p>:::hint</p> <p>Some content to be shown in a hint box.</p> <p>:::</p>
However, what we really want is for the :::
syntax to trigger a container with a class called “hint” (or whatever text has been added by the author), like this:
<div class="hint"> <p>Some content to be shown in a hint box.</p> </div>
Jekyll hooks
We’re running inside a Jekyll hook, so there is a file named custom_html.rb
inside my _plugins
directory with a simple hook defined…
Jekyll::Hooks.register :pages, :post_convert do |item| # Do something with the item end
This is where item
comes from in the examples below and I’ll leave out the hook-specific code to keep the examples short.
gsub basics
You can do a simple replace using strings and gsub
, like this:
content = " <p>:::hint</p> <p>Test</p> <p>:::</p>" content = content.gsub(':::', '<div>') puts content
Basic gsub
usage looks for the first string, and replaces it with the second one.
You can see from the output that this replaces the :::
strings, but this isn’t enough to solve our requirement just yet.
<p><div>hint</p> <p>Test</p> <p><div></p>
Our problems are:
- We can’t tell the difference between opening and closing tags if we just use ‘:::’
- We still have those pesky paragraph tags that shouldn’t be there
- We are missing the class name and the text for it is now content
We can use our problem to explore some more advanced use cases for gsub
.
gsub with regular expressions
We can tell the difference between a start and end tag using a regular expression. Don’t shudder, it’s not going to be that bad. The syntax for using a regular expression is shown below.
We use one gsub
to find the opening tag, including the surplus paragraphs, and one to find the closing tag, replacing them as appropriate.
content = " <p>:::hint</p> <p>Test</p> <p>:::</p>" content = content .gsub(/<p>:::[a-z]+<\/p>/, '<div>') .gsub(/<p>:::<\/p>/, '</div>') puts content
The key part of the regular expression is that [a-z]+
part, which explains that we expect to find some extra text on the opening tag that isn’t there on the closing tag.
Here’s the output.
content = " <p>:::hint</p> <p>Test</p> <p>:::</p> " content = content .gsub(/<p>:::[a-z]+<\/p>/, '<div>') .gsub(/<p>:::<\/p>/, '</div>') puts content
Our output is now valid HTML, but our class name is still missing. We’ll tackle that next.
Using a match from the regular expression in the output
We just need to fine tune our regular expression now to get hold of that class name, so we can use it in the output.
The first part of the change is to wrap parentheses around the text match, to say we want to capture the text that is found. To put it another way [a-z]+
already finds the text we want, but ([a-z]+)
will keep hold of it for later use. Who knew brackets were so meaningful.
The second part of our update is to use the text we found in the output. The syntax for this is \1
. Where you have multiple matchers, they are all numbered, so the next one is \2
and so on.
As we want to use the text as the class name, we’ll use '<div class="\1">'
content = " <p>:::hint</p> <p>Test</p> <p>:::</p> " content = content .gsub(/<p>:::([a-z]+)<\/p>/, '<div class="\1">') .gsub(/<p>:::<\/p>/, '</div>') puts content
Our output is now exactly what we want. We’re converting a markdown block into an HTML block with the appropriate class name.
<div class="hint"> <p>Test</p> </div>
The key parts
To summarise, here’s the line of code with important bits called out:
content .gsub(/<p>:::([a-z]+)<\/p>/, '<div class="\1">') # ^ / starts and ends the regular expression # ^ brackets create the capture group # ^ \1 uses the first match in the output
Recommend
-
8
Categories AnalyticsAdobe Analytics Segment ApplicabilityAdobe Analytics makes it super-easy to add segements, with a visual designer t...
-
13
How to find installed text to speech voices on Windows I’m using a Visual Studio Code extension that provides text-to-speech, which I use as part of
-
12
Title case text with MySql I needed to update a WordPress taxonomy on a site with thousands of categories and tags. This is not a task for a human, so I created a bit of a gnarly SQL script to update the MySql table. As you can see...
-
23
What is the optimal number of members for an agile team? This question is rather fascinating, in part because of the misunderstandings that have arisen in respect of the famous George Miller paper about the magical number seven (plus or m...
-
6
Upload all files in a folder to FTP This is the second old-school post this week. Hey, I’m clearing the decks of some odd stuff that I had to do. Today, it’s uploading all files in a folder (but not sub-folders) to FTP, if they have been...
-
43
Run a Bash Script with Arguments in GitHub Actions This is just a quick not on how to run a bash script with parameters in GitHub actions, and how to use the passed argument in the script. GitHub Action Here’s the jobs sec...
-
12
CMA Browsers and Cloud Gaming Response The Web is an amazing open platform, but a few times in the history of The Web we have faced challenges keeping it in great shape. When Microsoft had a browser monopoly with Internet Explorer, they w...
-
10
Running Jekyll on Windows There are three parts to this quick start on running Jekyll on Windows. This assumes you pulled an existing Jekyll repo and want to run it locally. If you want to create something new, there’s a command for that,...
-
10
Partially cleaned hotel rooms I walked up and down quite a long hotel corridor a few times today and observed an interesting process for cleaning. Without going into details, the result of all the effort was a line of partially cleaned ro...
-
12
Adding a sitemap to Jekyll You don’t really need a plugin to add a sitemap to your Jekyll site. You can use this basic template and extend as required. To give you control over whether a page appears in the sitemap, I’ve used an ad...
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK