Regular Expression (Regex) Tutorial for Matching a URL
source link: https://gist.github.com/DawkC/cb036f082e94772d05567c7e65c01e9b
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Regular Expression (Regex) Tutorial for Matching a URL
A regular expression (sometimes called a rational expression) is a sequence of characters that define a search pattern, mainly for use in pattern matching with strings, or string matching, i.e. “find and replace”-like operations. Regular expressions are a generalized way to match patterns with sequences of characters. It is used in every programming language like C++, Java, Python and Javascript.
Summary
This Regex will describe searching for a URL that are in the form of a string. It will describe the components of a URL regex search which will include its anchors, quantifiers, character classes, grouping and capturing, bracket expressions and greedy and lazy matching.
Regex code snippet: /^(https?://)?([\da-z.-]+).([a-z.]{2,6})([/\w .-])/?$/
Table of Contents
Regex Components
Anchors
Anchors are indiciative of the start and the end of a regular expression. For this regular expression, the opening anchor is the symol " ^ " and the closing anchor is the symbol " $ ".
Quantifiers
In regular expressions, quantifiers are symbols for optional characters. In the above regex, the quantifier " ? " is used three times: (https?://)? and /?. The character infront of each " ? " is considered an entirely optional character, signifying that this character may or may not exist in the a potential URL. There are three total character Classes in the URL matching regex: [\da-z.-], [a-z.] and [/\w .-]
Character Classes
In regular expressions, character classes enable a search for specific characters contained in a regex. Character classes are defined using brackets []. Character classes can refer to a specific range of characters, or a specific, single character. For example, in the current regex,[\da-z.-] follows 'https:', and will also match any digit \d, or letter from a to z \da-z, or a period or dash [\da-z.-].
Grouping and Capturing
Regular expressions use parentheses to indicate groupings. As in mathematics, the operations placed within parentheses are run or evaluated before all other parts of the regular expression. In our URL matching regex, the groups within parentheses are representative of the required or essential parts of a URL. There are four total groupings in the URL matching regex: (https?://), ([\da-z.-]+), ([a-z.]{2,6}), and '([/\w .-]*)'
Bracket Expressions
In regular expressions, bracket expressions are used to search for a range of characters. For example, this code snippet, [a-z.], is telling the regex to match any character within the brackets. The range is defined as any letter between a and z, and any period.
Greedy and Lazy Match
In regular expressions, the " * " and " + " symbols command the regex to find the longest possible matching string. When these two symbols are present, this is a sign of greedy matching. For lazy matching, a " ? " symbol must be added instead. A lazy match only searches for the shortest matching string. The URL matching regex has two greedy expressions: ([/\w .-]*), and ([\da-z.-]+)
Author
I am Charles Dawkins and I am a budding web developer taking a Full Stack Web Development course at University of Toronto. View my work on my Github profile
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK