Strange Behavior of the Global Regex Flag
So, for some time I’ve been building a jQuery plugin called jQuery.validity. I’ll post more on that later, but suffice it to say for now that the plugin uses regexes to examine the format of strings. A few days ago, Jeff (another of the Validity developers) caught Validity acting strangely when he used it with some custom regexes.
What was happening was the regex would give seemingly random results testing whether the the same string with the same regex. This would result in Validity allowing invalid inputs and disallowing valid ones: pretty much the opposite of what it’s supposed to do.
Eventually, we discovered that the problem was related to the the global flag that he was applying to his regexes. Removing the global flag caused the regexes to act the way we would expect them to.
Validity worked again. I added notes to the documentation entreating developers not to use regexes with global flags. This is okay, I guess for a temporary solution, but I wanted to actually solve the problem and understand where it was coming from.
The following code snippet illustrates this strange behavior:
var
// Regex to search for the word duck
// without the global flag set.
nonGlobal = /duck/,
// The same regex,
// except with the global flag set.
global = /duck/g,
string = "duck duck goose";
// Use the regex to test the string:
nonGlobal.test(string); // True.
// We can test these two as many times as we like.
// This regex tested against this string
// will always return true.
nonGlobal.test(string); // True.
nonGlobal.test(string); // True.
nonGlobal.test(string); // True.
// However:
global.test(string); // True.
global.test(string); // True, again.
global.test(string); // False!?!?
global.test(string); // True again!
global.test(string); // True.
global.test(string); // Now we're back to false.
// This global regex,
// tested many times against the same string
// will not always yield the same result.
To Jeff and I, this behavior seemed like JavaScript was misbehaving (even though we got the same strange behavior in all browsers). It seemed to us that one regex tested against one string should elicit the same result every time.
I set forth to research the topic. When I finally figured out what was going on, it turned out that we both had incomplete understandings of how regexes were supposed to work in JavaScript.
(Well, we won’t take all the blame. My feelings are that good, complete documentation on how JavaScript handles the global flag is hard to find and confusing when you do find it. In this article, I’ll try to lay it out as clear as possible.)
What Exactly is the Global Flag For?
Simply put, the global flag tells regexes to find and capture all possible matches in a string (not just the first one). The global flag is only useful when you attempt to capture matches out of a string, not when you’re testing a string’s format.
(Really, JavaScript only supports three regex flags. Before doing research on this problem I didn’t really know exactly what they did. I’ll cover them in what I believe to be plain terms in this article’s appendix.)
What Does JavaScript Do With a Global Regex
Normally, the global flag is useful for the “regex.match(string)” method, which will return an array of matches. However, if one is not attempting to locate matches, the global flag will still affect the “regex.test(string)” method!
If the regex is global, then “regex.test(string)” will only examine the string so far as the conditions of the regex are satisfied, and then return true. But, it will also set the “regex.lastIndex” property to the character index of where the examination stopped. When “regex.test(string)” is run a second time, it will examine the string starting at “lastIndex”. This will happen even if you’re testing of a totally different string, of any length!
What Happened in the Example
As you can see in the code example above, the first time “global.test(string)” is run, the global regex found the word “duck” within the string, then saved the “global.lastIndex” property to “4”.
The second time the test is run, JavaScript started from position four and found the word “duck” a second time returning true. At this point “global.lastIndex” is equal to “9”.
The third time the regex is run, JavaScript started from the ninth position and encountered the word “goose” followed by the end of the string. Since “goose” does not match the regex, false is returned. However, since the end of the string was reached without satisfying the regex, “regex.lastIndex” was set to “0” effectively restarting this unpleasant ordeal at the beginning of the string.
As you might have guessed, the next two tests would return true, followed by a false on the third.
It’s possible that this mechanism was meant for iterating over matches within a string, but that functionality would seem to make more sense in the “regex.exec(string)” method than in “test”.
(It’s my opinion that the test method shouldn’t be doing this. Test means test! It’s testing! Not iterating or finding matches. Testing!)
The Solution
The solution to this dilemma is painfully simple, if the regex is global just set the “lastIndex” property to zero before you test. Observe:
if (regex.global) {
regex.lastIndex = 0;
}
That was easy. I put this code into Validity so that it will execute this before it uses a regex for testing a string. In this way Validity will be able to make use of regexes that have global flags.
Conclusion
The morals of this story are:
- Only use the global flag (“/g”) on your regexes if you’re actually going to be finding several matches within a string.
- If you use a global regex for testing anyway, enclose it with the anchors (“^” and “$”) so that either the whole regex will be satisfied or not.
- If you don’t use the anchors, at the very least, set the “lastIndex” to zero before you use a global regex to test as string.
Appendix
I feel that most of the documentation and tutorials that cover regexes in JavaScript don’t explain flags terribly well. Here, I’ll attempt to explain them concisely and clearly.
The three regex flags that are supported by JavaScript’s regex engine are:
- /g: As we’ve seen, this flag will instruct the regex engine to search for all possible matches within a string. In some cases, it will cause the engine to behave iteratively.
- /i:This flag enables case-insensitivity. Using this flag is a simpler way of allowing upper and lower case than using bracketed groups.
- /m: This flag enables multiline searching. Specifically, what it will do is cause the “^” and “$” anchors represent the start and end of the entire string rather than just the start and end of the first line.
Comments
Good points, I think I will definitely subscribe! I’ll go and read some more! What do you see the future of this being?
Nice article, yet another regex quirk to watch out for.
One correction though: with the /m multiline flag the “^” and “$” anchors match the beginning and end of any line *in addition to* matching the beginning and end of the entire string. Without the flag they just match the beginning and end of the entire string.
Social comments and analytics for this post…
This post was mentioned on Reddit by Burnt_World: I may have read that as ‘Global Reflex Gag’……
Very interestng
Thank you, much appreciated!
We like this data presented and it has given myself some sort of commitment to succeed for some factor, so keep up the good work.