A word to those doing SAX-based XML parsing and implementing the org.xml.sax.ContentHandler interface: Don’t assume all the characters between a “begin” and “end” tag, e.g. given the following XML snippet:
<mycoolelement>My precious data</mycoolelement>
…I’m talking about the “My precious data” part — are returned in a single call to characters(). Yes, most of the time, characters() will smile and give you “My precious data” with a bow on top. But don’t be fooled! Sometimes it’ll give you “My precio” in one call, and “us data” in another. And you have no idea when.
The solution then, is to accumulate the data you receive from characters(), until you get notification the end tag has been reached, and THEN, and ONLY THEN, process the data.
Let’s just say I ran into this, and it’s caused me several hours of grief. Thankfully I was able to learn from others’ experiences (see question, reply, followup and answer). Yay mailing lists and Google!