#clojure logs

2008-02-22

14:14Chouserrhickey: fyi, I adjusted xml.clj slightly to use tagsoup instead of java's sax parser, and it's working quite nicely.
14:15rhickeycool, want to put it on the group?
14:19rhickeyThe author of TagSoup, John Cowan, is on the Clojure group
14:23Chouserheh, coo.
14:23Chousercool
14:23Chouserwell, what do you think of adding an optional parameter to xml.clj's parse to allow specifying a parser?
14:24ChouserI haven't tried to do that yet, but I assume it would be easy
14:24rhickeyIs that all it takes? sure
14:24Chouserok, if I get that working, I'll post it to the group.
14:24rhickeygreat
14:51Chouserrhickey: there. what could be easier?
14:52rhickeythanks
16:08Chouserhuh. I think I just found a bug in xml.clj
16:09Chouser<td>some <b>bold</b> text</td> when parsed includes neither "some" nor "text", only "bold"
16:18rhickeyI'll look at it
16:18Chouserok, thanks. I can see the problem, but I'm not sure how best to fix it.
16:19Chousercharachters can be called when *state* is :between, and usually that should be just fine.
16:20ChouserstartElement would have to handle pushing an *sb* like endElement does
16:23rhickeyyes on the startElement
16:24rhickeybetween is kind of broken notion, I put it in to deal with junk ws/nl stuff which I get from the SAX parser where no one would consider there to be interleaved text, and didn't want to create content entries for it
16:24Chouserok
16:24rhickeyI'll have to dump ws-only character content to avoid that
16:31Chouserwell, I don't mind the whitespace for now.
16:31ChouserI've got a sufficiently patched-up version I can proceed...
16:37Chouserwhee! Ok, so to do the equivalent of the xpath: //td[b = 'Listing #']/node()[position() = last()]
16:37ChouserI can say: (seq-filter html flatten :td [:b "Listing #"] #(first (reverse (% :content))))
16:39rhickeyseq-filter?
16:39Chouserwhere "flatten" is a function that means "//"
16:39albinorhickey: Are you the principal creator of clojure?
16:39ChouserUm, yeah, lousy name. All the names are lousy, but it works.
16:39rhickeyyes
16:40albinorhickey: do you get paid to do it?
16:40rhickeyno
16:41albinorhickey: does anyone else make core contributions are you pretty much on your own?
16:41rhickeyjust me
16:42albinorhickey: very impressive, thanks for letting me take some of your time
16:42rhickeysure
16:43Chouserseq-filter is a macro that mainly applies mapcat to each expr, passing the result to the next expr.
16:44Chouserthen sprinkle in a little sugar for tag names (:td), sub-queries ([...]), and content-matching for strings ("Listing #"), and you've got most of what you need for a flexible query system for xml.clj-produced vector/maps.
16:45Chouserand it's all lazy
16:45rhickeyneat
16:45Chouseryeah, once I actually use it a bit more so as to wear down the rough edges, I hope to share it.
16:45ChouserGot any better ideas for the name?
16:46Chousermapcat->
16:59rhickeyattempted fix for xml.clj is up
17:03Chouserthanks!
17:05Chouserworks for me, and thanks for including my little patch. :-)