ArtsAutosBooksBusinessEducationEntertainmentFamilyFashionFoodGamesGenderHealthHolidaysHomeHubPagesPersonal FinancePetsPoliticsReligionSportsTechnologyTravel

Regular expressions in TEXworks Manual

Updated on July 18, 2010
Texworks
Texworks

Regular expressions

As TEXworks is built on the base of Qt4, the regular expressions, often referred to as regexp, available are a sub-set of the one found for Qt4. See the site of Qt4 for more complete information. It is possible to find other information about regexps on the net or from books. But pay attention that all systems (programming languages, editors,…) do not use the same set of instructions; there is no “standard set”.

Introduction

When searching and replacing, one has to define the text to be found. This can be the text itself “Abracadabra”, but often it is necessary to define the strings in a more powerful way to avoid repeating the same operation many times with only small changes from one time to the next; example, one wants to replace sequences of the letter a by one o but not all of them, only the sequences of 3, 4, 5, 6 and 7 a; this would require repeating changing 5 times. Another example: replace the vowels by §, again 5 replace operations.

Here come the regular expressions!

A simple character (a or 9) represents itself. But a set of characters can be defined: [aeiou] will match any vowel, [abcdef] the letters a b c d e f; this last set can be shortened as [a-f] using “-” between the two ends of the range.

To define a set not to be taken, one uses “^”: the caret negates the character set if it occurs as the first character, i.e. immediately after the opening square bracket.[^abc] matches anything except a b c.

Codes to represent special sets

When using regexps, very often one has to create strings which generally represent other strings, I mean, if you are looking for a string which represents an email address, the letters and symbols will vary; still you could search for any string which corresponds to an email address (text@text.text – roughly). So there are abbreviations to represent letters, figures, symbols,…

These codes replace and facilitate the definition of sets; for example to mean the set of digits [0-9], one can use “\d”. The following table lists the replacement codes.

ElementMeaning

c	Any character represents itself unless it has a special regexp meaning. Thus c matches the character c.
\c	A character that follows a backslash matches the character itself except where mentioned below. For example if you wished to match a literal caret at the beginning of a string you would write “^”.
\n	This matches the ASCII line feed character (LF, Unix newline, used in TEXworks).
\r	This matches the ASCII carriage return character (CR).
\t	This matches the ASCII horizontal tab character (HT).
\v	This matches the ASCII vertical tab character (VT).
\xhhhh	This matches the Unicode character corresponding to the hexadecimal number hhhh (between 0x0000 and 0xFFFF).
\0ooo	(i.e., zero-ooo) matches the ASCII/Latin-1 character corresponding to the octal number ooo (between 0 and 0377).
.	(dot)This matches any character (including newline). So if you want to match the dot, you have to escape it “\.”.
\d	This matches a digit.
\D	This matches a non-digit.
\s	This matches a white space.
\S	This matches a non-white space.
\w	This matches a word character or “_”).
\W	This matches a non-word character.
\n	The n-th back-reference, e.g.\1,\2, etc.
 

Using these abbreviations is better than describing the set, because the abbreviations remain valid in different alphabets.

Pay attention that the end of line is often taken as a white space. Under TEXworks the end of line is referred to by “\n”.

Repetition

One doesn’t work only on unique letter, digit, symbol; most of the time these are repeated (ex.: a number is a repetition of digits and symbols – in the right order).

To show the number of repetitions, one uses a so called “quantifier”: a{1,1} means at least one and only one a, a{3,7} between 3 and 7; {1,1} can be dropped, so a{1,1} = a.

This can be combined with the set notation: [0-9]{1,2} will correspond to at least one digit and at most two, the integer numbers between 0 and 99. But this will match any group of 1 or 2 figures within a string; if we want that this matches the whole string (we only have 1 or 2 figures in the string) we will write the regular expression as ^[0-9]{1,2}$; here ^ says that the required string should be the first character of the string, the $ the last, so there is only one or two figures in the string (^ and $ are “assertions” – see later for more).

Here a table of quantifiers. E represents an expression (letter, abbreviation, set).

E?

Matches zero or one occurrence of E. This quantifier means the previous expression is optional. It is the same as E{0,1}.

E+

Matches one or more occurrences of E. This is the same as E\{1,MAXINT\}.

E*

Matches zero or more occurrences of E. This is the same as E{0,MAXINT}. The * quantifier is often used by a mistake for the + quantifier. Since it matches zero or more occurrences it will match no occurrence at all.

E{n}

Matches exactly n occurrences of the expression. This is the same as repeating the expression n times.

E{n,}

Matches at least n occurrences of the expression. This is the same as E{n,MAXINT}.

E{,m}

Matches at most m occurrences of the expression. This is the same as E{0,m}.

E{n,m}

Matches at least n occurrences of the expression and at most m occurrences of the expression.

MAXINT

depends on the implementation, minimum 1024.

Alternatives and assertions

When searching, it is often necessary to search for alternatives, ex.: apple, pear, cherry, but not pineapple. To separate the alternatives, one uses |: apple|pear|cherry. But this will not prevent to find pineapple, so we have to specify that apple should be standalone, a whole word (as is often called in the search dialog boxes).

To specify that a string should be considered standalone, we specify that it is surrounded by word separators/boundaries (begin/end of sentence, space), like\bapple\b. For our alternatives example we will group them by parentheses and add the boundaries \b(apple|pear|cherry)\b. Apart from \b we have already seen ^and $.

Here a table of the “assertions” which do not correspond to characters and will never be part of the result of a search.

^

The caret signifies the beginning of the string. If you wish to match a literal ^ you must escape it by writing\^

$

The dollar signifies the end of the string. If you wish to match a literal $ you must escape it by writing\$

\b

A word boundary.

\B

A non-word boundary. This assertion is true wherever \b is false.

(?=E)

Positive lookahead. This assertion is true if the expression matches at this point in the regexp.

(?!E)

Negative lookahead. This assertion is true if the expression does not match at this point in the regexp.

Notice the different meanings of ^ as assertion and as negation inside a set!

Final notes

Using rexexp is very powerful, but then also very dangerous; you could change your text at unseen places and sometimes reverting to the previous situation is not fully possible. If you immediately see the error, you could use

Ctrl+Z

.

Showing how to exploit the full power of regexp would require much more than this extremely short summary; in fact it would require a full manual on it own.

Also note that there are some limits in the implementation of regexps in TEXworks; in particular, the assertions (^ and $) only consider the whole file.

Finally, do not forget to “tick” the regexp option when using them in the Find and Replace dialogs and to un-tick the option when not using regexps.

(Excerpt from TeXWorks Manual by Alain Delmotte)

working

This website uses cookies

As a user in the EEA, your approval is needed on a few things. To provide a better website experience, hubpages.com uses cookies (and other similar technologies) and may collect, process, and share personal data. Please choose which areas of our service you consent to our doing so.

For more information on managing or withdrawing consents and how we handle data, visit our Privacy Policy at: https://corp.maven.io/privacy-policy

Show Details
Necessary
HubPages Device IDThis is used to identify particular browsers or devices when the access the service, and is used for security reasons.
LoginThis is necessary to sign in to the HubPages Service.
Google RecaptchaThis is used to prevent bots and spam. (Privacy Policy)
AkismetThis is used to detect comment spam. (Privacy Policy)
HubPages Google AnalyticsThis is used to provide data on traffic to our website, all personally identifyable data is anonymized. (Privacy Policy)
HubPages Traffic PixelThis is used to collect data on traffic to articles and other pages on our site. Unless you are signed in to a HubPages account, all personally identifiable information is anonymized.
Amazon Web ServicesThis is a cloud services platform that we used to host our service. (Privacy Policy)
CloudflareThis is a cloud CDN service that we use to efficiently deliver files required for our service to operate such as javascript, cascading style sheets, images, and videos. (Privacy Policy)
Google Hosted LibrariesJavascript software libraries such as jQuery are loaded at endpoints on the googleapis.com or gstatic.com domains, for performance and efficiency reasons. (Privacy Policy)
Features
Google Custom SearchThis is feature allows you to search the site. (Privacy Policy)
Google MapsSome articles have Google Maps embedded in them. (Privacy Policy)
Google ChartsThis is used to display charts and graphs on articles and the author center. (Privacy Policy)
Google AdSense Host APIThis service allows you to sign up for or associate a Google AdSense account with HubPages, so that you can earn money from ads on your articles. No data is shared unless you engage with this feature. (Privacy Policy)
Google YouTubeSome articles have YouTube videos embedded in them. (Privacy Policy)
VimeoSome articles have Vimeo videos embedded in them. (Privacy Policy)
PaypalThis is used for a registered author who enrolls in the HubPages Earnings program and requests to be paid via PayPal. No data is shared with Paypal unless you engage with this feature. (Privacy Policy)
Facebook LoginYou can use this to streamline signing up for, or signing in to your Hubpages account. No data is shared with Facebook unless you engage with this feature. (Privacy Policy)
MavenThis supports the Maven widget and search functionality. (Privacy Policy)
Marketing
Google AdSenseThis is an ad network. (Privacy Policy)
Google DoubleClickGoogle provides ad serving technology and runs an ad network. (Privacy Policy)
Index ExchangeThis is an ad network. (Privacy Policy)
SovrnThis is an ad network. (Privacy Policy)
Facebook AdsThis is an ad network. (Privacy Policy)
Amazon Unified Ad MarketplaceThis is an ad network. (Privacy Policy)
AppNexusThis is an ad network. (Privacy Policy)
OpenxThis is an ad network. (Privacy Policy)
Rubicon ProjectThis is an ad network. (Privacy Policy)
TripleLiftThis is an ad network. (Privacy Policy)
Say MediaWe partner with Say Media to deliver ad campaigns on our sites. (Privacy Policy)
Remarketing PixelsWe may use remarketing pixels from advertising networks such as Google AdWords, Bing Ads, and Facebook in order to advertise the HubPages Service to people that have visited our sites.
Conversion Tracking PixelsWe may use conversion tracking pixels from advertising networks such as Google AdWords, Bing Ads, and Facebook in order to identify when an advertisement has successfully resulted in the desired action, such as signing up for the HubPages Service or publishing an article on the HubPages Service.
Statistics
Author Google AnalyticsThis is used to provide traffic data and reports to the authors of articles on the HubPages Service. (Privacy Policy)
ComscoreComScore is a media measurement and analytics company providing marketing data and analytics to enterprises, media and advertising agencies, and publishers. Non-consent will result in ComScore only processing obfuscated personal data. (Privacy Policy)
Amazon Tracking PixelSome articles display amazon products as part of the Amazon Affiliate program, this pixel provides traffic statistics for those products (Privacy Policy)
ClickscoThis is a data management platform studying reader behavior (Privacy Policy)