i trying parse html using nekohtml.
the problem when below code snippet executed on sun jdk 1.5.0_01
works fine (this when using eclipse sun jre). when same thing executed on ibm j9 vm (build 2.3, j2re 1.5.0 ibm j9 2.3 windows xp x86-32 j9vmwi3223ifx-20070323 (jit enabled)
not working (this when using ibm rad development).
nodelist tags = doc.getelementsbytagname("td"); (int = 0; < tags.getlength(); i++) { element elem = (element) tags.item(i); // elem }
by working fine mean getting list of "td" elements can process further. in case of j9 not entering for
loop.
i using latest version of nekohtml (along bundled xerces jars). doc
in above code of type org.w3.dom.document
(the runtime class used org.apache.html.dom.htmldocumentimpl
)
the ibm j9 details follows:
java version "1.5.0" java(tm) 2 runtime environment, standard edition (build pwi32devifx-20070323 (ifix 117674: sr4 + 116644 + 114941 + 116110 + 114881)) ibm j9 vm (build 2.3, j2re 1.5.0 ibm j9 2.3 windows xp x86-32 j9vmwi3223ifx-20070323 (jit enabled) j9vm - 20070322_12058_lhdsmr jit - 20070109_1805ifx3_r8 gc - wasifix_2007) jcl - 20070131
any idea, suggestion or workaround appreciated. thanks.
i have 2 ideas.
- i have verified xerces part of jre installation, believe arrives classpath of application there. sun , ibm bring different versions of xerces. so, first approach check , try replace have under ibm sun's version. if helps have 2 options: continue running ibm java xerces sun or continue investigate what's wrong xerces ibm.
- are there other differences between dev , production environments? these same operating systems? chance using (for example) windows development , unix production xml written on windows \r\n new line? or more: if xml contains unicode characters , written in windows can contain special (invisible) prefix indicates unicode. prefix may cause parser fail.
Comments
Post a Comment