An Old Idea

I’ve been giving some thought to parsing microformats lately. A few threads seem to be converging…

The first is that it’s hard to parse microformats. You can hand-write a parser in a little bit of time that’s 80% right. But getting all of the hcard rules, e.g., encoded is tricky. It’s reasonable to assume, therefore, that there are a lot of 80% parsers out there like the one I wrote for my Ray Ozzie Clipboard example.

The second issue relates to hatom, which uses different class names for the same concept at different scopes. For example, the entry title is called “entry-title” not “title”. I asked Ryan about this when I saw him at www2006, and he told me that they vacillated on this decision, but they settled on “entry-title” because people can nest other microformats inside hatom, and so it would be easier for the parser writers if there were no colliding class names, even in different microformats. In fact, he suggested that they’d probably made a mistake with hcard, since the class names were so likely to collide with other microformats. Ok, so in other words “entry-title” is a hack around the problem of it being hard to parse microformats, and we can expect more of these.

When I bumped into Brian at the same event, I commented that microformats really have a problem with nesting. He agreed. He said it put a burden on the parser writer to potentially have to understand all microformats in order to reliably parse web pages that contain them.

So,

  1. It’s a lot of trouble to write a parser
  2. Bad parsers will proliferate
  3. Microformats are evolving toward being easier to parse, not easier to create
  4. It’s not clear how you can nest microformats w/o knowing how parsers will behave
  5. Users are discouraged from inventing their own specialized microformats, presumably because of the risk of collisions and difficulty others will have in parsing them

My proposal is that we employ a very old solution to this problem: create proper, machine-readable schemas or grammars for each microformat.

The schema…

  1. is a formal specification of the microformat
  2. can be used to generate parsers (like yacc)
  3. can be used to dynamically parse new microformats
  4. is language-neutral

Here’s a fragment of a schema for hcard in a BNF-inspired syntax:

{vcard} ::= {fn} {n} [{org}] [{url}] [{email}] [{photo}] [{tel}]
{n} ::= {fn}
{tel} ::= ({tel-entry})
{tel-entry} ::= [{type}] {value}
{url} ::= a@href
{email} ::= a@href
{photo} ::= img@src | object@data
{fn} ::= body
{org} ::= body
{type} ::= body
{value} ::= body

Note that it has domain-knowledge of HTML (e.g., “img@src”, which means pull the value out of the src attribute of an img tag, and “body” means pull the body of the tag). This syntax doesn’t encode all of the kinds of rules you’ll find in the hcard spec, but it probably could be extended to do so. (Note that a link could be added to the header of web pages pointing to the schema.)

So in addition to making it trivial to generate or find correct parsers for microformats in any language or environment, how does this solve the nesting problem? First, the parser will only “find” data that matches the schema. So if you stick a hcard inside an hatom entry, then the hatom parser wouldn’t be looking for the “title” beneath the “author”, since that’s not in the schema. Second, if you wanted to have a rule like that the DOM-depth were used to disambiguate two “title” properties, then you could enforce this at the parser-generator level, not at the level of every-parser-in-the-world. Third, it’s actually possible to use link tags to refer to every schema inside the web page, making it feasible that the parser would understand all of the microformats contained in the page without any additional work.

The other thing that’s interesting is that this specification actually implies a json-compatible data-model. The “( … )” notation refers to a list, the terminals refer to values, and each of the labels (e.g., “fn”) refer to keys in a name/value pair list. So we’d expect to parse,

<a class="url fn" href="http://smackman.com">Steve</a>

to

{vcard: {fn: "Steve", url: "http://smackman.com" }}

in JSON-syntax. (Don’t confuse JSON-syntax with JSON-data-model. The latter can be represented in (almost?) any programming language using built-in language constructs while the former is a serialization format).

So this means that the schema spec allows you to parse from HTML to a JSON-data-model. This means that, in contrast to yacc, there isn’t a need to have application-specific instructions in the spec. I’d also point out that the process of going in the opposite direction—from JSON-data-model to HTML—is exactly what microtemplates buy you.

That’s the gist of the idea… a lot more details to be worked out, of course.

hardcore lesbian sex
free home webcam sex clips
tit fuck
king of the hill cartoon porn
suck cock
teen gang bang
ebony teen
family sex clips
gay guys kissing
young latina lesbians
breast bondage
gay glory hole
nude teens
teengirls
free tranny
incest forums
teen violence
girl boobs
teen lesbians
anal fucking
xxx videos
girls being raped
horse dick
webcam girl live
blonde pornstar pussy
hot anime girls
celebrity fakes
suspension bondage
latino pretty girls
happy hentai
free anime
milf cruiser
animalsex
naked anime girls
hot girl webcam video
cute teen girls
hot japanese babes
free lesbian stories
paris hilton blowjob
paris hilton sex
fat free pussy picture
pink pornstars
gay cum shot
gloryhole movies
dildo teen girls webcam
ebony milf
ca
xxx video clip galleries
bukkake porn
ebony anal
incest rape
webcam strip boobs
closeups shaved pussy
porn star webcam
bisexual sex
bdsm library
webcam live sex video
moms tits
mom strips for son
mature handjob
nude photography
tranny trick
free xxx movie clips
pantyhose galleries
mother daughter sex
granny gallery
nude asians
xxx live webcam
ebony models
young teens
indian women
gallery of paris hilton
threesome erotic cams
bisexual gang bang
gang bang squad
nude fat girls
horse fuckers
shemale poonfarm
incest cartoons
free japanese schoolgirl
celebrities topless
milf challenge
amateur creampie
gigantic hairy blonde pussy
celebrity legs
mature women in stockings
nude asian
paris hilton sex
nude webcam movies
my big cock
indian lesbians
tranny bukkake
bisexual teen girl
huge dick large cock
latina anal
indian boobs
facial bukkake
girls on webcam live
hairy divas
celebrity skin
hentai comics
free hardcore sex stories
xxx nude sex webcam
adult xxx webcam chat
mature wife
gay cum
muscle hunks
amateur tits
bukake
amateur wife
big black tits
celebrity tits
russian teen porn
free webcam girl movie
nude boobs webcam
japanese tits
free live teen porn webcams
gay anal sex
farm sex
cartoon porn free
japanese girls hot
cartoon sex movies
double anal
simpsons movie
mature lesbians
mother sucks son
xxx teen webcam
girl suck
hairy vaginas
suck big cock
shoshone indians
hot pornstars
gigantic boobs
pantyhose tgp
milf camps
free rape movies
hot webcam nude clips
fat webcam couple sex
free huge cock videos
anime bondage
tawnee stone hardcore
asian porn
tits girl webcam
gay hairy men
brutal blowjobs
bisexual men free
teen latina lesbians
cum swallow
mature movies
bbw mature webcams free movies porn
beast sex
free monster cock
shemale fucking
lesbian cartoon porn
gay gang bang
japanese sex movies
nude asians
nude latin women
amateur curves
pantyhose videos
gloryhole locations
milky tits
asian hardcore
shemale bukkake
gloryhole videos
hot girls in tight panties
lesbian pussy
hardcore anal sex
free sex video
huge black tits
pairs hilton
fat sex webcam pics
dick in a pussy
free rape movies
chloe jones hardcore
monster cock fuck
black nude girls
pamela anderson pussy

Comments (3)

Fried pizza, really?

I’m off to Edinburgh, Scotland today for WWW2006. I have a paper in the tagging workshop. Here’s my presentation (done with S5). I used microtemplates to generate a lot of the tag visualizations.

Comments (2)

Promoting microtemplates

It was cool to see that Elias blogged about microtemplates and got straight to the point: it’s easy.

My goal now is to try take to make this point to a few of the right people, have them get it and say something about it, and then others will pay attention. It’s, honestly, kind of a funny position to be in. I guess I do promote my ideas, at least inside the four walls of my workplace, but not usually so deliberately.

I’ve started with approaching the microformats folks, since it is, to some extent, derivative, and also the adoption of microtemplates greatly facilitates the adoption of microformats. I’ve gotten a few “very promising” remarks, but not the whoah! that I kinda expected. But I think I was being a little optimistic — it will take some time and some compelling examples for the potential to be apparent. Also, I don’t know if the microformat people are generally as concerned with creating dynamic or ajax web applications as others might be, so there’s a bit of a mismatch.

Ok, so what about the Rails folks? I started to dissect an example I found of rails programming at OnLamp and make some recommendations on the microtemplates wiki. I’ll find the discussion list and forward this to them…. but I probably still need to implement what I describe and show some examples. It would be particularly compelling if I did the same dynamic table as in this example.

The other item on my agenda is the ROCB. One idea is that if you drop a vcard on my web page, I want to be able to create the rendering of that vcard using microtemplates. I met Ray once… maybe I can drop him a note when this demo works?

hot indian girls
very hot girls
girls drinking horse cum
young gay boys
movie teen webcam
bondage gear
bbw porn
asian ladyboys
pussy cum
free sex clips
incest pics
gay blowjob
dog sex
glory hole forum
brother sister incest
secretaries in short skirts stockings
milf rider
horse blowjob
male masturbation
big black gay cock
bisexual threesome
teen tit webcam pictures
cartoon network sex
big black boobs
lesbian pink
monster cock pics
black girls nude
paris hilton movie
natural hairy pussies
gay ebony
hardcore anal sex
how to eat pussy
indian nudes
gangbang squad
japanese schoolgirl
hentai anime
naked indians
teen facial
hot latin women
free movies online
hairy teen girls with webcams
big black boobs
hardcore black fucking
amateur anal
sex with horse
mature granny
bondage art
cute anime
amateur adult video
nylon stocking galleries
hairy lesbians
paris hilton fucking
girls teen wearing a diaper
asian pornstar babes
glory hole pictures
torrie wilson nude
dog cock
mother incest
gay male bondage
xxx teen porn
absolute shemale
bisexual free movies
celebrity cruises
asian fuck
boy dick
free sex clips
japanese models nude
glory hole
sex movie
huge hairy blonde pussy
lesbian love
girl suck
men with huge cocks
bisexual threesomes
webcam porn star
free lesbian pictures
celebrity nudes
girls on webcam
world s biggest penis
teen xxx webcam pictures
free incest movies
bbw thumbs
anal rimming
tits girl webcam
latina lesbians
amateur sex video
pantyhose gallery
asian schoolgirl
black teen sex
free xxx video
old pussy
hentai boobs
asian rape
anal dildo
celebrity fakes
mom porn
free group sex
pantyhose feet
free japanese schoolgirl
gay dick
galleries of mature women
beautiful latina buns
free rape videos
school sex japan
black anal sex
hilary duff nude
free adult webcam xxx
asian blowjobs
nude girl webcam chats
free naked latinas
free shemale pics
mature lady
shemale cock
free mature webcams bbw porn
anal fucking
asian bukakke
milfseeker
bbw galleries
hardcore bondage
live teen porn webcams
lesbian pics
no tits
tampa bukake
large boobs
hairypussy
nude models
bisexual men
hardcore lesbians
milf hunters
paris hilton blowjob
anal milf
pokemon xxx
suck my cock
live big tits webcam
3d cartoon sex
tera patrick hardcore
girls live webcam
family sex
free blowjob videos
amateur porn
adult disney cartoons
big gay cock
forced orgasm
hentai girls
hot blonde lesbian porn star
young amateur teen girls
hot webcam girls
nude male celebrities
group porn webcams
sexy asian
adult videos
very young nude girls
indian nudes
hot girls
teen boobs on webcam
male rape
anal destruction
free sex clips
adultcartoons
adult video clips
monster cock suck
hot threesome
porn video
mature model
cum drinkers
bukkake teen
teen girls in thongs
live free xxx webcam
black huge gay cock
teenpussy
ebony pornstars
amateur video
gay gloryhole sex
friend\’s hot mom
bukkake parties in uk
horse cocks
paris hilton jpg
anime pussy

Comments

hCalendar and timezones

I was thinking that hCalendar might be helpful for helping with timezones. The basic idea, just like, ecmanaut says, is to send the zone information in GMT and let the browser do the conversion. So I’m thinking if we use the microformat for dates, hcalendar, then the date gets formatted as,

<abbr class="dtstart" title="2006-05-01T12:15:03.0Z">5:15am</abbr>

where the “title” attribute is machine readable and in GMT, and the body is human readable and in, presumably, the time zone of the page author. All that’s needed is a script (or greasemonkey plugin) like this one that walks the DOM, finds these hCalendar fragments, and replaces the time in the users timezone into the human-readable part of the date. So,

5:15am

gets displayed,

5:15am

Ok… but there are a couple of problems. The first is the formatting has changed. The resolution to this would be to write a function that deduced the format from the example, and then fills in that format with the local timezone. Seems doable, at least in a way that works 80% of the time (and fails gracefully with a generic date format). The second is that the intent of the time has, in fact, been changed a little bit. The user needs to know that this has happened (the greenish background is a hint at that), and needs to be able to see the original string to compare. The user might also prefer to see the data formatted as the author intended, but to be able to hover over the date and see it transformed into his own timezone. This also seems doable.

Comments (2)

application/atom+json

JSON looks to be an extremely useful data format for Ajax (client-side web) applications because it is javascript, and so it can be parsed efficiently and loaded from any URL (not just the host that served the web page), opening up the door to a new class of applications that do client-side data integration in the browser. Yahoo has a great writeup of how to use JSON services.

Now, most of the services that are currently coming out are in the lingua franca, XML. XML is great, but it doesn’t have the specific advantages that JSON does for Ajax (ironically, since the “x” in Ajax is XML… but then Ajax sounds better than Ajaj). So what to do? There does exist a universal mapping from XML to JSON called Badgerfish. This is good, but the problem is that the JSON output is funky. Who wants to have variables named “$”?

Can we do a nicer mapping in the case of a particular XML schema, namely Atom? Atom doesn’t use attributes too much, and that’s the stumbling block when mapping from XML to JSON. What if we came up with a JSON representation of Atom that was as similar to the XML as possible, but was actually a different representation. Let’s call it application/atom+json.

Looking at the canonical Atom example, we can picture the atom+json starting something like this:

feed={
  title:"Example Feed",
  link:{href="http://example.org"},
  updated:"2003-12-13T18:30:02Z",
  author:{name:"John Doe"},
  id:"urn:uuid:60a76c80-d399-11d9-b93C-0003939e0af6", ...

Ok, but with entry we have a bit of a problem, because there are typically more than one. So do we rename it to entries, and have it refer to a list? Seems reasonable…

  entries: [{
    title:"Atom-Powered Robots Run Amok",
    link:{href:"http://example.org/2003/12/13/atom03"},
    id:"urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a",
    updated:"2003-12-13T18:30:02Z",
    summary:"Some text."
   }, ...]
}

But wait, it can’t be that simple? Well, I guess there can be multiple authors, so do we make the authors always a list (frequently with only one element), or do we allow author to refer to be the same as authors with only one entry? The other issue is that in this example, link had no body, and none of the tags with bodies had attributes. The reason badgerfish goes into “$” and “@attr” syntax is because it’s possible to have both. But why pollute the Atom mapping with awkard constructs that rarely occur? An alternate mapping might be to say that, say you wanted to put an attribute on the title tag, you’d say, title_attr:value.

Hmmm….

Comments (4)

Microtemplates

Microtemplates are a way of creating templates in HTML that can be evaluated in the browser. More info here: microtemplates.org.

Comments