I have a very large file of over 5,000 poems in the outliner software, Aquaminds NoteTaker 4 for Mac. Each poem is on a separate page in NoteTaker. I can export the poems to a number of formats including plain text, OPML, NoteTaker’s own XML format called NTML, Microsoft Word doc, Comma-delimited, Tab-delimited, rich text, etc. I can export the file as one large output or as individual documents, and I can include or not include the page title in the export. The challenge is how to format the data for import into my Panorama database. Each poem exists in a single cell of its NoteTaker page. The page title is in most cases the title of the poem. So there are really just two “fields” which need to be generated. I am not a programmer at all so could use some help.
Here is a sample of one poem’s NTML output:
<page title="The Winds of Time" entryCt="1" ident="2282" exportedToServices="0" hasTab="0" prevPageIdent="2283" nextPageIdent="2281">
<heading><SPAN class="&pageStyle;">The Winds of Time</SPAN></heading>
<pageView pageNum="4271" backgroundColor="#fffcf6" showControls="1" showPageBreaks="1" showTagIcons="0" entryNumberingStyle="NO_ENTRY_NUMBERING" pageStyle="NO_STYLE" entryIdent="0" creationTime="545967653" backgroundImageStyle ="BGIMAGE_DEFAULT"/>
<entry ident="2282.1" timestamp="545967653" modificationTime="545967665" priority="0" state="0" isExpanded="0" isEditable="1" isHighlighted="0" isShowingAllLines="1" isPageBreak="0">
<cell><SPAN class="style3">THE WINDS OF TIME
The winds of time stream gently now,
But roared upon a when,
From fairy tale to nightmare's row,
They circle back again.
Kevin Johnson ©2018</SPAN>
</cell>
</entry>
</page>
---
As you can see, the Page title is contained here:
<page title="The Winds of Time" entryCt="1" ident="2282" exportedToServices="0" hasTab="0" prevPageIdent="2283" nextPageIdent="2281">
---
The poem body is contained here:
<cell><SPAN class="style3">THE WINDS OF TIME
The winds of time stream gently now,
But roared upon a when,
From fairy tale to nightmare's row,
They circle back again.
Kevin Johnson ©2018
There is also a creation date for each poem in Unix dates format except that I noticed that the recorded dates are way off, like a decade, so they’re useless.
Could someone show me how to convert this XML data into two importable fields, or help with the approach I thought of, below?
An alternative I thought of would be to export the poems as individual text files, in which the first line of the text would be the page title and the rest of the lines would be the poem body. Perhaps a script could parse line 1 into a Title field and the rest of the lines into the Poem body field. Line breaks would, of course, need to be preserved. Then all the resulting data would need to be compiled as a single importable file so I could pull it into Panorama X.
It means a lot to me to get my writings into a more indexable, categorized, searchable form which is suited to curating my work for future publication. Looking forward to some assistance here.
Thanks!