Download the BYJU'S Exam Prep App for free GATE/ESE preparation videos & tests - Download the BYJU'S Exam Prep App for free GATE/ESE preparation videos & tests -

Regular Expressions

A regular expression, also known as regex, is a pattern that represents a collection of strings that match the pattern. To put it another way, a regex only accepts a specific set of strings while rejecting all others.

In this article, we will look more into the Regular Expressions according to the GATE Syllabus for (Computer Science Engineering) CSE. We will read ahead to find out more about it.

Table of Contents

What are Regular Expressions?

A regular expression, also known as a rational expression, is a string of characters that defines a search pattern. It is typically used in operations like “find and replace” when comparing patterns in strings (Wikipedia).

A more general method of matching patterns with character sequences is through regular expressions. Every programming language, including C++, Java, and Python, uses it.

Importance of Regular Expressions

The most widely used editors, including Notepad++, Sublime, Brackets, Google Docs, and Microsoft Word, enable search and replace using regex in Google Analytics’ URL matching.

For example, The regular expression for any email address would be:

^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$

The regular expression given above can be used for checking whether a set of characters happens to be an email address.

How to Write Regular Expressions?

The asterisk symbol ( * )

It instructs the computer to do a 0 or more-time match on the previous character or combination of characters up to infinite.

For example, a regular expression xy*z will give xz, xyz, abbc, xyyyz….and so on.

Repeaters : * , + and { }

These symbols serve as repeaters and let the computer know that the character before it should be utilised more than once.

The Plus symbol ( + )

It instructs the computer to repeat the previous character or group of characters at least once up to infinite.

For example, a regular expression xy+z will give xyz, xyyz, xyyyz, … and so on.

Wildcard – ( . )

The dot symbol is known as the “wildcard character” because it can be used in place of any other symbol.

For example, a regular expression .* would tell a computer that a character can be utilised any number of times.

The curly braces {…}

The value inside this bracket instructs the computer to repeat the previous character or set of characters as many times as necessary.

For example, {3} would mean that the character preceding the given character is to be repeated thrice, {max} would mean the preceding character matches a maximum or less times. {min,max} would mean that the preceding character is at least repeated min or repeated at most max times.

Optional character – ( ? )

The previous character may be present in the strings that are to be matched, according to this symbol, which is supplied to the computer.

For example, the format for the document file can be written as – “docq?” Here, the ‘?’ tells a computer that the document q may be present or it may not be present in the file format’s name.

The dollar ( $ ) symbol

It instructs the computer that this match must take place before n at the end of the line or string or at the end of the string itself.

For example, -\d{3}$ would match with various patterns such as “-555” in character “-602-555”.

The caret ( ^ ) symbol

Tells the computer that the match should begin at the string/line’s beginning by setting the location for the match.

For example, ^\d{3} would match with patterns such as “602” in character “602-555-“.

Character Classes

Only one in a group of characters can be matched by a character class. It is used to match a language’s most fundamental building block, such as a letter, digit, space, symbol, etc.

/S : It matches all the non-whitespace characters

/s : It matches all the whitespace characters like the space and the tab

/D : It matches all the non-digit characters

/d : It matches all the digit character

/W : It matches all the non-word character

/w : It matches all the word characters (basically alpha-numeric)

/b : It matches all the word boundaries (it includes dashes, spaces, semi-colons, commas, etc.)

[set_of_characters] – It matches all the single characters in set_of_characters. The match, by default, is case-sensitive.

For example, [pqr] would match characters p, q, and r in any string.

[^set_of_characters] – It matches all the single characters that are not in set_of_characters. The match, by default, is case-sensitive.

For example, [^pqr] will match any character except p, q and r.

[first-last] – It matches all the single characters in the range, starting from the first to the very last.

For example, [n-sN-S] would match all the characters from n to s or N to S.

The Escape Symbol: \

Add a backslash (/) before the character if you wish to match on the actual “+,” “.,” etc., characters. This will instruct the computer to use the next character as a search character and take into account its pattern-matching potential.

For example, \d+[\+-x\*]\d+ would match patterns, such as “4+4” and “6*8” in “(4+4) * 6*8”.

Grouping Characters ( )

To make a group of a regular expression’s many symbols operate as a single entity and as a block, you must enclose the regular expression in parenthesis ( ).

For example, ([N-S]\w+) consists of two elements of all the regular expressions combined together. Such an expression would match any pattern that contains the uppercase letters followed by any other character.

Vertical Bar ( | )

It matches any element separated by the (|) vertical bar character.

For example, th(e|is|at) would match words – this, the, and that.

\number

It allows a sub-expression (sub-expressions are expressions captured/enclosed within circular brackets) that is previously matched to be subsequently identified in the very same regular expression. Here, \n means that the group that is enclosed within the n-th bracket would be repeated at the current position.

For example, ([a-z])\1 would match “ee” in Geek since the character at the second position is the same as the character at the first position of the match.

Comment: (?# comment)

Inline comment: This comment would end at the very first closing parenthesis.

# [to end of line]

X-mode comment. The comments start at an unescaped # and then continue to the end of any line.

Keep learning and stay tuned to get the latest updates on GATE Exam along with GATE Eligibility CriteriaGATE 2023GATE Admit CardGATE SyllabusGATE Previous Year Question Paper, and more.