Mental note: Never, ever, design a website in a propietary
development environment again. I finally grok the genuine clue behind
XML: EXPLICIT STATE. No more “oh, that’s that way because it’s the 839th
bit with two NULLs” before it. No more “it’s that way because the operation
eighty cycles ago was a NOP.” Forget eXtensible, or eXtreme, or whatever.
XML is explicit, headache free state management.
Why do I say all this? Because I’m presently going through some ridiculous
hacks to extract the structure out of data formatted by NetObjects Fusion.
So far, the best method seems to be:
- Use lynx -dump -nolist to create a reasonably legible
text version of my documents. It’s not as utterly beautiful as
has suffered for no other reason than its homonymous with lynx, but it’s
got a dump mode which I haven’t hacked into links yet.
- Use txt2html to intelli
gently extract the structure that
lynx found from the original “fused” document into reasonably simple HTML.
- If necessary, use
convert back into text,
and use the RAW mode David gave me
to automatically emit a paragraph tag any time two carriage returns are
detected in sequence.
- Manually add the various images, boldfaces, italicizations, and
preformatting tags as needed.
This is of course, as they say in the industry, a ridiculous hack that
barely has enough elegance to be mentioned in public, but it works. And
in the end…that’s more than some people can say(sigh).