i'm trying match pascal string literal input following pattern: @"^'([^']|(''))*'$"
, that's not working. wrong pattern?
public void run() { using(streamreader reader = new streamreader(string.empty)) { var linenumber = 0; var linecontent = string.empty; while(null != (linecontent = reader.readline())) { linenumber++; string[] inputwords = new regex(@"\(\*(?:\w|\d)*\*\)").replace(linecontent.trimstart(' '), @" ").split(' '); foreach(string word in inputwords) { scanner.scan(word); } } } }
i search input string pascal-comment entry, replace whitespace, split input substrings match them following:
private void initialize() { matchingtable = new dictionary<tokenunit.tokentype, regex>(); matchingtable[tokenunit.tokentype.identifier] = new regex ( @"^[_a-za-z]\w*$", regexoptions.compiled | regexoptions.singleline ); matchingtable[tokenunit.tokentype.numberliteral] = new regex ( @"(?:^\d+$)|(?:^\d+\.\d*$)|(?:^\d*\.\d+$)", regexoptions.compiled | regexoptions.singleline ); } // ... here comes public tokenunit scan(string input) { foreach(keyvaluepair<tokenunit.tokentype, regex> node in this.matchingtable) { if(node.value.ismatch(input)) { return new tokenunit { type = node.key }; } } return new tokenunit { type = tokenunit.tokentype.unsupported }; }
the pattern appears correct, although simplified:
^'(?:[^']+|'')*'$
explanation:
^ # match start of string ' # match opening quote (?: # match either... [^']+ # 1 or more characters except quote character | # or '' # 2 quote characters (= escaped quote) )* # number of times ' # match closing quote $ # match end of string
this regex fail if input you're checking against contains besides pascal string (say, surrounding whitespace).
so if want use regex find pascal strings within larger text corpus, need remove ^
, $
anchors.
and if want allow double quotes, too, need augment regex:
^(?:'(?:[^']+|'')*'|"(?:[^"]+|"")*")$
in c#:
foundmatch = regex.ismatch(subjectstring, "^(?:'(?:[^']+|'')*'|\"(?:[^\"]+|\"\")*\")$");
this regex match strings like
'this matches.' 'this too, though ''contains quotes''.' "mixed quotes aren't problem." ''
it won't match strings like
'the quotes aren't balanced or escaped.' there 'before or after' quotes. "even whitespace problem."
Comments
Post a Comment