Arts Autos Books Business Education Entertainment Family Fashion Food Games Gender Health Holidays Home HubPages Personal Finance Pets Politics Religion Sports Technology Travel

Regular expressions in TEXworks Manual

Updated on July 18, 2010

Dalriada Books

Regular expressions

As T_EXworks is built on the base of Qt4, the regular expressions, often referred to as regexp, available are a sub-set of the one found for Qt4. See the site of Qt4 for more complete information. It is possible to find other information about regexps on the net or from books. But pay attention that all systems (programming languages, editors,…) do not use the same set of instructions; there is no “standard set”.

Introduction

When searching and replacing, one has to define the text to be found. This can be the text itself “Abracadabra”, but often it is necessary to define the strings in a more powerful way to avoid repeating the same operation many times with only small changes from one time to the next; example, one wants to replace sequences of the letter a by one o but not all of them, only the sequences of 3, 4, 5, 6 and 7 a; this would require repeating changing 5 times. Another example: replace the vowels by §, again 5 replace operations.

Here come the regular expressions!

A simple character (a or 9) represents itself. But a set of characters can be defined: [aeiou] will match any vowel, [abcdef] the letters a b c d e f; this last set can be shortened as [a-f] using “-” between the two ends of the range.

To define a set not to be taken, one uses “^”: the caret negates the character set if it occurs as the first character, i.e. immediately after the opening square bracket.[^abc] matches anything except a b c.

Codes to represent special sets

When using regexps, very often one has to create strings which generally represent other strings, I mean, if you are looking for a string which represents an email address, the letters and symbols will vary; still you could search for any string which corresponds to an email address (text@text.text – roughly). So there are abbreviations to represent letters, figures, symbols,…

These codes replace and facilitate the definition of sets; for example to mean the set of digits [0-9], one can use “\d”. The following table lists the replacement codes.

ElementMeaning

c	Any character represents itself unless it has a special regexp meaning. Thus c matches the character c.

\c	A character that follows a backslash matches the character itself except where mentioned below. For example if you wished to match a literal caret at the beginning of a string you would write “^”.

\n	This matches the ASCII line feed character (LF, Unix newline, used in T_EXworks).

\r	This matches the ASCII carriage return character (CR).

\t	This matches the ASCII horizontal tab character (HT).

\v	This matches the ASCII vertical tab character (VT).

\xhhhh	This matches the Unicode character corresponding to the hexadecimal number hhhh (between 0x0000 and 0xFFFF).

\0ooo	(i.e., zero-ooo) matches the ASCII/Latin-1 character corresponding to the octal number ooo (between 0 and 0377).

.	(dot)This matches any character (including newline). So if you want to match the dot, you have to escape it “\.”.

\d	This matches a digit.

\D	This matches a non-digit.

\s	This matches a white space.

\S	This matches a non-white space.

\w	This matches a word character or “_”).

\W	This matches a non-word character.

\n	The n-th back-reference, e.g.\1,\2, etc.

Using these abbreviations is better than describing the set, because the abbreviations remain valid in different alphabets.

Pay attention that the end of line is often taken as a white space. Under T_EXworks the end of line is referred to by “\n”.

Repetition

One doesn’t work only on unique letter, digit, symbol; most of the time these are repeated (ex.: a number is a repetition of digits and symbols – in the right order).

To show the number of repetitions, one uses a so called “quantifier”: a{1,1} means at least one and only one a, a{3,7} between 3 and 7; {1,1} can be dropped, so a{1,1} = a.

This can be combined with the set notation: [0-9]{1,2} will correspond to at least one digit and at most two, the integer numbers between 0 and 99. But this will match any group of 1 or 2 figures within a string; if we want that this matches the whole string (we only have 1 or 2 figures in the string) we will write the regular expression as ^[0-9]{1,2}$; here ^ says that the required string should be the first character of the string, the $ the last, so there is only one or two figures in the string (^ and $ are “assertions” – see later for more).

Here a table of quantifiers. E represents an expression (letter, abbreviation, set).

E?

Matches zero or one occurrence of E. This quantifier means the previous expression is optional. It is the same as E{0,1}.

E+

Matches one or more occurrences of E. This is the same as E\{1,MAXINT\}.

E*

Matches zero or more occurrences of E. This is the same as E{0,MAXINT}. The * quantifier is often used by a mistake for the + quantifier. Since it matches zero or more occurrences it will match no occurrence at all.

E{n}

Matches exactly n occurrences of the expression. This is the same as repeating the expression n times.

E{n,}

Matches at least n occurrences of the expression. This is the same as E{n,MAXINT}.

E{,m}

Matches at most m occurrences of the expression. This is the same as E{0,m}.

E{n,m}

Matches at least n occurrences of the expression and at most m occurrences of the expression.

MAXINT

depends on the implementation, minimum 1024.

Alternatives and assertions

When searching, it is often necessary to search for alternatives, ex.: apple, pear, cherry, but not pineapple. To separate the alternatives, one uses |: apple|pear|cherry. But this will not prevent to find pineapple, so we have to specify that apple should be standalone, a whole word (as is often called in the search dialog boxes).

To specify that a string should be considered standalone, we specify that it is surrounded by word separators/boundaries (begin/end of sentence, space), like\bapple\b. For our alternatives example we will group them by parentheses and add the boundaries \b(apple|pear|cherry)\b. Apart from \b we have already seen ^and $.

Here a table of the “assertions” which do not correspond to characters and will never be part of the result of a search.

The caret signifies the beginning of the string. If you wish to match a literal ^ you must escape it by writing\^

The dollar signifies the end of the string. If you wish to match a literal $ you must escape it by writing\$

\b

A word boundary.

\B

A non-word boundary. This assertion is true wherever \b is false.

(?=E)

Positive lookahead. This assertion is true if the expression matches at this point in the regexp.

(?!E)

Negative lookahead. This assertion is true if the expression does not match at this point in the regexp.

Notice the different meanings of ^ as assertion and as negation inside a set!

Final notes

Using rexexp is very powerful, but then also very dangerous; you could change your text at unseen places and sometimes reverting to the previous situation is not fully possible. If you immediately see the error, you could use

Ctrl+Z

Showing how to exploit the full power of regexp would require much more than this extremely short summary; in fact it would require a full manual on it own.

Also note that there are some limits in the implementation of regexps in T_EXworks; in particular, the assertions (^ and $) only consider the whole file.

Finally, do not forget to “tick” the regexp option when using them in the Find and Replace dialogs and to un-tick the option when not using regexps.

(Excerpt from TeXWorks Manual by Alain Delmotte)

Cultural Immersion during Travel
How to Say Thank You in 36 Languages Around the World
by Paul Richard Kuehn12
English Language Idioms
50 Car Idioms and Expressions
by Ben Reed2
Oil Painting
Water-Soluble Oils Using Oil Painting Techniques
by Judy Filarecki72
Guitars & String Instruments
How to Restring an Acoustic Guitar
by Joanna20
English Language Idioms
8 Everyday Expressions That Have Their Origins in Aesop's Fables
by Stephen Barnes4

This website uses cookies

As a user in the EEA, your approval is needed on a few things. To provide a better website experience, hubpages.com uses cookies (and other similar technologies) and may collect, process, and share personal data. Please choose which areas of our service you consent to our doing so.

Necessary

Features

Marketing

Statistics

Approve All & Submit
Approve Checked Only

For more information on managing or withdrawing consents and how we handle data, visit our Privacy Policy at: https://corp.maven.io/privacy-policy

Show Details

Necessary
HubPages Device ID	This is used to identify particular browsers or devices when the access the service, and is used for security reasons.
Login	This is necessary to sign in to the HubPages Service.
Google Recaptcha	This is used to prevent bots and spam. (Privacy Policy)
Akismet	This is used to detect comment spam. (Privacy Policy)
HubPages Google Analytics	This is used to provide data on traffic to our website, all personally identifyable data is anonymized. (Privacy Policy)
HubPages Traffic Pixel	This is used to collect data on traffic to articles and other pages on our site. Unless you are signed in to a HubPages account, all personally identifiable information is anonymized.
Amazon Web Services	This is a cloud services platform that we used to host our service. (Privacy Policy)
Cloudflare	This is a cloud CDN service that we use to efficiently deliver files required for our service to operate such as javascript, cascading style sheets, images, and videos. (Privacy Policy)
Google Hosted Libraries	Javascript software libraries such as jQuery are loaded at endpoints on the googleapis.com or gstatic.com domains, for performance and efficiency reasons. (Privacy Policy)

Features
Google Custom Search	This is feature allows you to search the site. (Privacy Policy)
Google Maps	Some articles have Google Maps embedded in them. (Privacy Policy)
Google Charts	This is used to display charts and graphs on articles and the author center. (Privacy Policy)
Google AdSense Host API	This service allows you to sign up for or associate a Google AdSense account with HubPages, so that you can earn money from ads on your articles. No data is shared unless you engage with this feature. (Privacy Policy)
Google YouTube	Some articles have YouTube videos embedded in them. (Privacy Policy)
Vimeo	Some articles have Vimeo videos embedded in them. (Privacy Policy)
Paypal	This is used for a registered author who enrolls in the HubPages Earnings program and requests to be paid via PayPal. No data is shared with Paypal unless you engage with this feature. (Privacy Policy)
Facebook Login	You can use this to streamline signing up for, or signing in to your Hubpages account. No data is shared with Facebook unless you engage with this feature. (Privacy Policy)
Maven	This supports the Maven widget and search functionality. (Privacy Policy)

Marketing
Google AdSense	This is an ad network. (Privacy Policy)
Google DoubleClick	Google provides ad serving technology and runs an ad network. (Privacy Policy)
Index Exchange	This is an ad network. (Privacy Policy)
Sovrn	This is an ad network. (Privacy Policy)
Facebook Ads	This is an ad network. (Privacy Policy)
Amazon Unified Ad Marketplace	This is an ad network. (Privacy Policy)
AppNexus	This is an ad network. (Privacy Policy)
Openx	This is an ad network. (Privacy Policy)
Rubicon Project	This is an ad network. (Privacy Policy)
TripleLift	This is an ad network. (Privacy Policy)
Say Media	We partner with Say Media to deliver ad campaigns on our sites. (Privacy Policy)
Remarketing Pixels	We may use remarketing pixels from advertising networks such as Google AdWords, Bing Ads, and Facebook in order to advertise the HubPages Service to people that have visited our sites.
Conversion Tracking Pixels	We may use conversion tracking pixels from advertising networks such as Google AdWords, Bing Ads, and Facebook in order to identify when an advertisement has successfully resulted in the desired action, such as signing up for the HubPages Service or publishing an article on the HubPages Service.

Statistics
Author Google Analytics	This is used to provide traffic data and reports to the authors of articles on the HubPages Service. (Privacy Policy)
Comscore	ComScore is a media measurement and analytics company providing marketing data and analytics to enterprises, media and advertising agencies, and publishers. Non-consent will result in ComScore only processing obfuscated personal data. (Privacy Policy)
Amazon Tracking Pixel	Some articles display amazon products as part of the Amazon Affiliate program, this pixel provides traffic statistics for those products (Privacy Policy)
Clicksco	This is a data management platform studying reader behavior (Privacy Policy)