Regular Expression (Regex) Tutorial for Matching a URL

A regular expression (sometimes called a rational expression) is a sequence of characters that define a search pattern, mainly for use in pattern matching with strings, or string matching, i.e. “find and replace”-like operations. Regular expressions are a generalized way to match patterns with sequences of characters. It is used in every programming language like C++, Java, Python and Javascript.

Summary

This Regex will describe searching for a URL that are in the form of a string. It will describe the components of a URL regex search which will include its anchors, quantifiers, character classes, grouping and capturing, bracket expressions and greedy and lazy matching.

Regex code snippet: /^(https?://)?([\da-z.-]+).([a-z.]{2,6})([/\w .-])/?$/

Regex Components

Anchors

Anchors are indiciative of the start and the end of a regular expression. For this regular expression, the opening anchor is the symol " ^ " and the closing anchor is the symbol " $ ".

Quantifiers

In regular expressions, quantifiers are symbols for optional characters. In the above regex, the quantifier " ? " is used three times: (https?://)? and /?. The character infront of each " ? " is considered an entirely optional character, signifying that this character may or may not exist in the a potential URL. There are three total character Classes in the URL matching regex: [\da-z.-], [a-z.] and [/\w .-]

Character Classes

In regular expressions, character classes enable a search for specific characters contained in a regex. Character classes are defined using brackets []. Character classes can refer to a specific range of characters, or a specific, single character. For example, in the current regex,[\da-z.-] follows 'https:', and will also match any digit \d, or letter from a to z \da-z, or a period or dash [\da-z.-].

Grouping and Capturing

Regular expressions use parentheses to indicate groupings. As in mathematics, the operations placed within parentheses are run or evaluated before all other parts of the regular expression. In our URL matching regex, the groups within parentheses are representative of the required or essential parts of a URL. There are four total groupings in the URL matching regex: (https?://), ([\da-z.-]+), ([a-z.]{2,6}), and '([/\w .-]*)'

Bracket Expressions

In regular expressions, bracket expressions are used to search for a range of characters. For example, this code snippet, [a-z.], is telling the regex to match any character within the brackets. The range is defined as any letter between a and z, and any period.

Greedy and Lazy Match

In regular expressions, the " * " and " + " symbols command the regex to find the longest possible matching string. When these two symbols are present, this is a sign of greedy matching. For lazy matching, a " ? " symbol must be added instead. A lazy match only searches for the shortest matching string. The URL matching regex has two greedy expressions: ([/\w .-]*), and ([\da-z.-]+)

Author

I am Charles Dawkins and I am a budding web developer taking a Full Stack Web Development course at University of Toronto. View my work on my Github profile

Regular Expression (Regex) Tutorial for Matching a URL

Regular Expression (Regex) Tutorial for Matching a URL

Summary

Table of Contents

Regex Components

Anchors

Quantifiers

Character Classes

Grouping and Capturing

Bracket Expressions

Greedy and Lazy Match

Author

Recommend

cosh · GitHub

有了Boss直聘这样的平台，为什么还需要猎头？

Strange radio waves emerge from direction of the galactic centre

Self-healing quasicrystals may resurrect hopes of practical applications

MIT researchers create fabric that can sense and react to its wearer's movement

Microvium Status Update – Coder Mike

元宇宙：区块链时代的代名词

Was Our Universe Created in a Laboratory?

William Shatner and crew of 3 launch to the final frontier on Blue Origin rocket

Oracle's Exadata enhancements focus on latency, security and bringing cloud to c...

About Joyk