Featured post
Errors processing bullet point via regex replace in VB to clean up XML file -
i'm trying clean xml file have utf-8 characters i'm having issues bullet point. files have bullet point in them , if remove these characters, rest of regex replace works fine, doesn't seem replace specific bullet character. looking @ hex 0x07 , in unicode /u0007 neither of these resolved error ("hexidecimal value 0x07, invalid character")
here of regex replace code (vb script in ssis) i'm using several iterations i've tried. appreciated.
xmlstring = fileio.filesystem.readalltext(filelocation) 'dim rgx regex = new regex("[\x00-\x08\x0b-\x0c\x0e-\x1f\u0000-\u0007]", regexoptions.none) 'dim rgx regex = new regex("[^0-9a-za-z]", regexoptions.none) 'dim rgx regex = new regex("[[:^print:]]", regexoptions.none) 'dim rgx regex = new regex("[[:^print:][\u0007]]", regexoptions.none) dim rgx regex = new regex("[^\x09\x0a\x0d\x20-\xd7ff\xe000-\xfffd\x10000-x10ffff]", regexoptions.none) 'dim rgx regex = new regex("[\x00-\x1f\x7f-\xff]+", regexoptions.none) rgx.replace(xmlstring, "")
thanks
one think need know whether regular expression being applied against string of bytes, or string of characters. (in perl there explicit difference, not sure vb - controlled way read data in). below 2 points not "rules" such, more form.
- if running against bytes, should use
\xxx
escape sequences. (and xx can 2 "digits") - if running against characters, should use
\uxxxx
escape sequences (\xxxxx same thing in languages)
looking @ uncommented regex, seems you're looking @ characters. imply file must in valid character encoding (probably 1 of utf-8, utf-16le, or cp1252). regex doing stripping out valid utf-8 characters not allowed according the xml spec. http://www.w3.org/tr/xml/#charsets . should fine.
but if string stream of bytes, , trying ensure valid utf-8 harder regex. other non-ascii don't know how.
one other point: shouldn't setting global attribute of regex before doing replace. problem? fixing first occurance not whole file?
- Get link
- X
- Other Apps
Comments
Post a Comment