so wrote parser routine take 1 xml file , reparse one. code later modified split large xml file many small xml files.
i having problem output. parsing works fine thing output includes unwanted strings hash(0x19f9b58), not sure why , need set of friendly eyes.
use encode; use xml::parser; $parser = xml::parser->new( handlers => {start => \&handle_elem_start, end => \&handle_elem_end,char => \&handle_char_data,}); $record; $file = shift @argv; if( $file ) {$parser->parsefile( $file );} exit; sub handle_elem_start { my( $expat, $name, %atts ) = @_; if ($name eq 'articles'){$file="_data.xml";unlink($file);} $record .= "<"; $record .= "$name"; foreach $key (keys %atts){$record .= " $key=\"$atts{$key}\"";} $record .= ">"; } sub handle_char_data { my( $expat, $text ) = @_; $text = decode_utf8( $text ); $record .= "$text"; } sub handle_elem_end { my( $expat, $name ) = @_; $record .= "</$name>"; if( $name eq 'article' ) { open (myfile, '>>'.$file); print myfile $record; close (myfile); print $record; $record = {}; } return unless( $name eq 'article' ); }
sample output:
... </article>hash(0x19f9b40) <article doi="10.1103/physrevseriesi.9.304"> <journal short="phys. rev. (series i)" jcode="pri">physical review (series i)</journal> <volume>9</volume> <issue printdate="1899-11-00">5</issue> <fpage>304</fpage> <lpage>309</lpage> <seqno>1</seqno> <price></price><tocsec>articles</tocsec> <arttype type="article"></arttype><doi>10.1103/physrevseriesi.9.304</doi> <title>an investigation of magnetic qualities of building brick</title> <authgrp> <author><givenname>o.</givenname><middlename>a.</middlename><surname>gage</surname></author> <author><givenname>h.</givenname><middlename>e.</middlename><surname>lawrence</surname></author> </authgrp> <cpyrt> <cpyrtdate date="1899"></cpyrtdate><cpyrtholder>the american physical society</cpyrtholder> </cpyrt> </article>hash(0x19f9b58) ...
hash strings not wanted, please advise.
$record = {};
sets $record
contain reference empty hash. everywhere else, treat $record
string, , append it. when treat hashref string, string hash(0x19f9b58)
(the number varies).
you meant
$record = q{};
which sets $record
empty string (just using alternate quotes).
Comments
Post a Comment