7

Advanced Ruby gsub with regular expressions | Steve Fenton

 3 years ago
source link: https://www.stevefenton.co.uk/2022/08/advanced-ruby-gsub-with-regular-expressions/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Advanced Ruby gsub with regular expressions

This post is really about the Ruby language gsub string method. It does contain a tiny bit of Jekyll hooks, but they are important to me and perhaps not to you. If you just want to know how to extract a match in gsub and use it in the output, scroll down to the bottom for the “final revelation”.

Let’s set the scene for the problem. I’m processing a custom Markdown block on a Jekyll site during a hook that fires after conversion, but before the result is written to disk.

The custom Markdown block looks like this:

:::hint

Some content to be shown in a hint box.

:::

By the time the text is handed to me, it’s already been mostly processed by Jekyll’s Markdown parser, so what we’re dealing with is something like:

<p>:::hint</p>

<p>Some content to be shown in a hint box.</p>

<p>:::</p>

However, what we really want is for the ::: syntax to trigger a container with a class called “hint” (or whatever text has been added by the author), like this:

<div class="hint">

<p>Some content to be shown in a hint box.</p>

</div>

Jekyll hooks

We’re running inside a Jekyll hook, so there is a file named custom_html.rb inside my _plugins directory with a simple hook defined…

Jekyll::Hooks.register :pages, :post_convert do |item|
    # Do something with the item
end

This is where item comes from in the examples below and I’ll leave out the hook-specific code to keep the examples short.

gsub basics

You can do a simple replace using strings and gsub, like this:

content = "
<p>:::hint</p>

<p>Test</p>

<p>:::</p>"

content = content.gsub(':::', '<div>')

puts content

Basic gsub usage looks for the first string, and replaces it with the second one.

You can see from the output that this replaces the ::: strings, but this isn’t enough to solve our requirement just yet.

<p><div>hint</p>

<p>Test</p>

<p><div></p>

Our problems are:

  • We can’t tell the difference between opening and closing tags if we just use ‘:::’
  • We still have those pesky paragraph tags that shouldn’t be there
  • We are missing the class name and the text for it is now content

We can use our problem to explore some more advanced use cases for gsub.

gsub with regular expressions

We can tell the difference between a start and end tag using a regular expression. Don’t shudder, it’s not going to be that bad. The syntax for using a regular expression is shown below.

We use one gsub to find the opening tag, including the surplus paragraphs, and one to find the closing tag, replacing them as appropriate.

content = "
<p>:::hint</p>

<p>Test</p>

<p>:::</p>"

content = content
    .gsub(/<p>:::[a-z]+<\/p>/, '<div>')
    .gsub(/<p>:::<\/p>/, '</div>')

puts content

The key part of the regular expression is that [a-z]+ part, which explains that we expect to find some extra text on the opening tag that isn’t there on the closing tag.

Here’s the output.

content = "
<p>:::hint</p>

<p>Test</p>

<p>:::</p>

"

content = content
    .gsub(/<p>:::[a-z]+<\/p>/, '<div>')
    .gsub(/<p>:::<\/p>/, '</div>')

puts content

Our output is now valid HTML, but our class name is still missing. We’ll tackle that next.

Using a match from the regular expression in the output

We just need to fine tune our regular expression now to get hold of that class name, so we can use it in the output.

The first part of the change is to wrap parentheses around the text match, to say we want to capture the text that is found. To put it another way [a-z]+ already finds the text we want, but ([a-z]+) will keep hold of it for later use. Who knew brackets were so meaningful.

The second part of our update is to use the text we found in the output. The syntax for this is \1. Where you have multiple matchers, they are all numbered, so the next one is \2 and so on.

As we want to use the text as the class name, we’ll use '<div class="\1">'

content = "
<p>:::hint</p>

<p>Test</p>

<p>:::</p>

"

content = content
    .gsub(/<p>:::([a-z]+)<\/p>/, '<div class="\1">')
    .gsub(/<p>:::<\/p>/, '</div>')

puts content

Our output is now exactly what we want. We’re converting a markdown block into an HTML block with the appropriate class name.

<div class="hint">

<p>Test</p>

</div>

The key parts

To summarise, here’s the line of code with important bits called out:

content
    .gsub(/<p>:::([a-z]+)<\/p>/, '<div class="\1">')
#         ^ / starts and ends the regular expression
#                ^ brackets create the capture group
#                                             ^ \1 uses the first match in the output

Recommend

  • 8

    Categories AnalyticsAdobe Analytics Segment ApplicabilityAdobe Analytics makes it super-easy to add segements, with a visual designer t...

  • 13

    How to find installed text to speech voices on Windows I’m using a Visual Studio Code extension that provides text-to-speech, which I use as part of

  • 12
    • www.stevefenton.co.uk 3 years ago
    • Cache

    Title case text with MySql | Steve Fenton

    Title case text with MySql I needed to update a WordPress taxonomy on a site with thousands of categories and tags. This is not a task for a human, so I created a bit of a gnarly SQL script to update the MySql table. As you can see...

  • 23

    What is the optimal number of members for an agile team? This question is rather fascinating, in part because of the misunderstandings that have arisen in respect of the famous George Miller paper about the magical number seven (plus or m...

  • 6
    • www.stevefenton.co.uk 3 years ago
    • Cache

    Upload all files in a folder to FTP | Steve Fenton

    Upload all files in a folder to FTP This is the second old-school post this week. Hey, I’m clearing the decks of some odd stuff that I had to do. Today, it’s uploading all files in a folder (but not sub-folders) to FTP, if they have been...

  • 43

    Run a Bash Script with Arguments in GitHub Actions This is just a quick not on how to run a bash script with parameters in GitHub actions, and how to use the passed argument in the script. GitHub Action Here’s the jobs sec...

  • 12

    CMA Browsers and Cloud Gaming Response The Web is an amazing open platform, but a few times in the history of The Web we have faced challenges keeping it in great shape. When Microsoft had a browser monopoly with Internet Explorer, they w...

  • 10
    • www.stevefenton.co.uk 3 years ago
    • Cache

    Running Jekyll on Windows | Steve Fenton

    Running Jekyll on Windows There are three parts to this quick start on running Jekyll on Windows. This assumes you pulled an existing Jekyll repo and want to run it locally. If you want to create something new, there’s a command for that,...

  • 10
    • www.stevefenton.co.uk 3 years ago
    • Cache

    Partially cleaned hotel rooms | Steve Fenton

    Partially cleaned hotel rooms I walked up and down quite a long hotel corridor a few times today and observed an interesting process for cleaning. Without going into details, the result of all the effort was a line of partially cleaned ro...

  • 12
    • www.stevefenton.co.uk 3 years ago
    • Cache

    Adding a sitemap to Jekyll | Steve Fenton

    Adding a sitemap to Jekyll You don’t really need a plugin to add a sitemap to your Jekyll site. You can use this basic template and extend as required. To give you control over whether a page appears in the sitemap, I’ve used an ad...

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK