Regular Expression with HTML and LaTeX code
When searching and replacing, one has to define the text to be found. This can be the text itself, but often it is necessary to define the strings in a more powerful way to avoid repeating the same operation many times. Regular Express (also 'regex') is powerful but sometimes very tricky, one of the many reasons is the variety of text format, and there seems not best way to define a regular expressions, as long as you get the job done.
Below are some tips joted down when I was dealing with Latex and html code, these might be useful to you.
To remove \section mark and its curly parentheses using regex:
\\section\{([^}]*)\}
use \ to escape back slash or left {, or right }, ^} means there are no parentheses between two {}.
replace with:
\1
\1 back references to contents inside the ().
To replace something like "I. First year." into \section{I. First year.}
(\n\n)([I,V,X][I-X]*\.[\s\w]+\.)
two newline, then begin Roman Number I, or V, or X, then zero or more I-X, then one or more space plus word between two dots.
\1\\section{\2}
To replace the left quotation marks:
The left quotation marks is one of famous problem in TeX or Latex, to find a left quote and replace with a `
"([^"]*)"
To find uppercase string
Now I have some text like these:
right and wrong, and which condemns all fear or hope of an unknown and unseen world.
[here three newline character \n\n\n]
RESPECT FOR THE WRITTEN CHARACTER
One of the most curious and harmless customs of the Chinese is that of
I need to find uppercase string, and add html tag <h2> at the start, and </h2> the end.
\n\n\n([A-Z]+[\s[A-Z]*]*)\n\n
to replace with:
\n\n\n<h2>\1</h2>\n\n