Wednesday, October 17, 2012

Semantic Web Presentation

This post is dedicated to all who took part in my recent Semantic Web presentation.

There was no time to cover everything I wanted.  I will use this post page to touch on some things I should have included but did not. 

Here is the link to my presentation:

I have mentioned that there are no good books to read, or at least none that I could find. I ended up reading Developer's guide to semantic web.  I cannot say I recommend it,  it is better than the other one I have tried: Semantic Web Programming.  I have also briefly looked at o'reilly book but it seemed it provides less comprehensive coverage of the subject.

Large topic I decided I will have no time to talk about during my presentation is: microformats.   I appears that microformats are liked by the REST community and semantic/linked data community is not crazy about them.  To me microformat is 'hardcoding' semantic data,  RDF is not.  I do see the flexibility of RDF as a big advantage.   If micoformats are new to you, the idea is to simply add class attributes to your HTML and put the semantic information by using certain agreed upon by community class names which represent semantic info.  The catch is that these names have to be what community has agreed to. That agreeing takes time and often is not easy (maybe even not possible).
Quick intro to this idea can be found here: http://microformats.org/get-started

Interestingly (and relevant) is the species microformat: http://microformats.org/wiki/species
It has been proposed in 2006 and is still in a straw-man status.  There are several good reasons why it is hard to agree on microformat for species.  Consider this simple question:  how would you identify a species, would you use, say, scientific name or common name?  Both are likely to change and do change... If only microformats had owl:sameAs... But if they did they would not be microformats, would they?
Now add more complex questions of how to handle several different taxonomies, etc.
Species microformat is a good example why the RDF direction maybe a better choice for creation of semantic information about biology and wildlife.

During the presentation, I have decided to focus on the aspect of publishing semantic data.  Another big and important topic is how to take advantage of semantic data published internally or by other entities.  How do you consume semantic info?  That would open a conversation about programming libraries like Jena, Pellet, Sesame..., tools like Protégé, Neon....  
The classic idea in this direction would be:  find all publications which talk about Bald Eagle needing Water as resource.  Now the publication may be tagged under Bald Eagle and River, or Bald Eagle and Lake, or Raptors and Fish.  Not your everyday DB query is it?

And finally there is the 'semantic first' and 'semantic last' discussion.  It may seem like a topic for later, but is not.  The point is that we would be designing software differently if part of the thought process involved semantic considerations.  It is hard to design for something you do not understand.
One aspect of such consideration is: is there a relevant ontology out there (example: Dublin Core for published works).  If so, should I design my software in a similar way (even if I do not intend to publish RDFs near term)?    The broader question in this area is:  how do we design the software so that adding semantic presence will not be hard in the future...

Lastly,  lots of people think of semantic web/linked data as one big database.  Let me reiterate that: whole web is one big database!   Will this be a game changer, will we see 'Macro' computing (if that even is a valid term)?  Looking at things from a very large scale point of view could be big (so what is relationship between number of cancer cases and geographical latitude? no problem, let me run a query...).
Such uniformity (one database!) will not be achieved with SOAP web services.  How about with microformats?  Except for few markups community agrees on, unlikely!   REST is an idea which may provide needed uniformity (but I doubt it will - see my previous posts on why).   With RDF and ontologies: we may get uniformity without uniformity (we do not need to agree on using the same terms as long as the relevant terms are connected). As long as owl.sameAs, rdfs.seeAlso, rdfs.isDefinedBy, foaf.primaryTopicOf, etc are used there still may be uniformity within all the diversity.  

2 comments: