Showing posts with label Semantic Web. Show all posts
Showing posts with label Semantic Web. Show all posts

April 14, 2008

Microformats, A Primer - Simple Semantics for the Web

In a recent write-up about the "Semantic Web", I ended my thoughts with the following:

"A fully realized Semantic Web will be quite amazing indeed, but it is going to take a long time to get to the point where the technology regularly intersects with our daily lives.

It is going to take a long time to annotate the world's information and then to capture personal information in the right way in order to really make it work the way it is supposed to.

We are a few years away before we really start to see real traction in terms of Semantic Web technology."

Introducing A Different Kind of Meaning

So, we already know that the term "semantic" stands for "meaning of." We also know that to achieve a "true" semantic Web, a great deal of work is going to need to get done.

This not only includes an outpouring of time and money, but will also require education, training and change s in operations in hundreds of thousands of systems world-wide.

Ouch!

What stinks about this is two-fold:
  1. If those behind the W3C's master theory of a Semantic Web have their way, this huge investment in business and technology change will be inevitable and someone is going to have to pay for it.

  2. The whole concept of, and what user experiences will be enabled due to a Semantic Web is WAY TOO COOL to wait around for!

Enter Microformats


True Semantic Web technology would enable computers to exchange/share, read and understand the meaning of data, and provide a mechanism for developers to create applications that provide for truly "next generation" user experiences. The concept of The Semantic Web is mostly about machine-to-machine data exchange and "behind the scenes" indexing, searching, storing and sharing.

Microformats, however, allow information that is actually intended for and consumed by human users to also be understood by software applications, similar to this Semantic Web concept.

Microformats emerged due to necessity.

For a quick peek back at the roots of Microformats, let's think back about 10 years ago when the "browser wars" were full-on and Web developers finally began to think of Web site user interfaces as they should be.

This ideological approach to the Web is what we essentially have today: A "markup" applied to underlying data to give it a specific look and feel or visual design. Back then, Cascading Style Sheets were the next big thing, and XML was blessed by the W3C as something official.

Both CSS and XML addressed the desire to separate underlying data from display or use of that data. Necessity is indeed the mother of invention, and the necessity for applications that share meaningful data caused the invention of Microformats, which can be looked at as a unique merger of XML and CSS.

Here + Now = Good

Perhaps the reason that a semantic concept like Microformats has become "real" at this stage in the game stems from the fact that implementing Microformats doesn't require a savvy Web developer to learn anything that they don't already know.
  • Microformats are not vaporware.
  • They are simple to implement.
  • They are based on familiar, standards-based technologies.
Microformats are the end-result of the approach of formatting existing Web content by tagging it with CSS-like tags, that describe the content's metadata. This approach uses only simple XHTML and HTML classes and attributes.

Tagging content in this manner allows information that was created and published online (and intended for end-users) to just as easily be understood by software applications.

Since the inception of the World Wide Web, it has been possible to load, scan and parse HTML documents.

We call this "screen scraping", and in reality, it is pretty clunky and never really a perfect solution. Programmers can create software applications that follow URLS, download page content, read through that content, and either store it, or act upon it.

Even if a program successfully scrapes content from a Web page, the content itself doesn't really have any independent meaning to it. A screen-scraping application just knows that text is text and knows where to get it, and where to store it, and maybe what to do with specific things that it finds. It's a dumb technology really. (stupid screen scrapers!)

The traditional means of creating Web pages so that they display nicely in Web browsers doesn't do much of anything to help software understand their content (or context of that content). There is simply no meaning to the data.

Microformats are intended to change all of that. They do so by enabling the attachment of semantics (semantic data) to online content as we currently know it. (Or think of it).

By using Microformats, data can be indexed & stored, searched and cross-referenced allowing information from many places about many things to be reused and recycled.

Using Microformats

Microformats started as a grass-roots effort, and have been defined based on common needs of those involved in the development community.

Because of this, it is no surprise that the most wide-spread uses of Microformats today mirror the types of online applications that are most pervasive and common types of data found in those applications.

Because of this, the Microformats that we currently see implemented include those that describe:
  • Event Listings
  • Atom Feeds
  • Contact Information
  • Addresses
  • Geographic Information
  • Content Reviews
  • Resumes / CVs
  • Social Networks
  • Lists and Outlines
  • Currency
  • Species (Living Things)
  • Measurements
Mozilla's Firefox 3 Web browser has native implementation of Microformat handling and implements a global Microformats object and associated API that provide developers an easy way to find and consume Microformats.

The out-of-the-box Microformat support in Firefox includes:
  • Addresses (adr - street or mailing address).
  • Geography (geo - geographical locations: latitude & longitude)
  • Human Contact Information (hCard - contact information for a person)
  • Events/Calendar (hCalendar - calendar appointment entries)
Firefox 3 also allows you to extend things a bit and add tags to other Microformats using one named... "tag".

It isn't just the Open Source community that is embracing Microformats. Community rumblings have lead to almost-certain speculation that Microsoft will offer native support for Microformats within Internet Explorer 8.0 and other future software application releases.

This isn't meant to be a technical article, so I am not going to get into the specifics of implementing Microformats. With that, a simple example will help to add a bit of context and help make it easier to understand how Microformats can be implemented.

How About A Date?

Sounds good to me hot stuff!

Let's use the example of a calendar appointment.

Let's say I made an item in my calendar that reminded me that I needed to write this blog posting today. If I we were to look at the underlying structure of that appointment in my calendar, it might look something like this:
BEGIN:VCALENDAR
PRODID:-//myCalendar//EN
VERSION:3.0
BEGIN:VEVENT
URL:http://whatanexperience.org/
DTSTART:20080414
DTEND:20080414
SUMMARY:Write Blog Post About Microformats
LOCATION:Your Office\, Chicago\, IL
END:VEVENT
END:VCALENDAR
Now, let's look at HTML that could be displayed in a Web browser that represents the exact same information:

whatanexperience.org Write Blog Post About Microformats: 20080414- 20080414, at My Office, Chicago, IL

Finally, let's look at the hCalendar Microformat markup behind that, which not only displays as above in the user interface, but contains the same computer-readable data as the Mac Calendar appointment file:


<div class=\"vevent\">
<a class=\"url\" href="http://whatanexperience.org/\">http://whatanexperience.org/</a>
<span class=\"summary\">Write Blog Post About Microformats</span>:
<span class=\"dtstart\">20080414</span>-
<span class=\"dtend\">20080414</span>,at <span class=\"location\">My Office, Chicago,IL</span>
</div>
This example is about as simple as it gets, but I hope that it opens your mind to the futue possibilities of what Microformats, or a technology like Microformats could bring to the publishing and consumption of web-based data.


Where does this go?

If Microformats were the holy grail of online data than the chatter of the Semantic Web and other competing theories would be almost silent. There is, however, a loud argument that XML by itself makes more sense than Microformats, and that ultimately "Web" data will be radically different when new technologies like related to the Semantic Web become more widely understood (and used).

These additional technologies include XML & RDF along with associated schemas, as well as OWL (web ontology language), SPQRQL (a query language) and business rules driven by RIF (rules interchange format).

The adoption of Microformats by companies like Microsoft and Mozilla is encouraging though, but as we've seen before... just because they put the functionality in the browser, doesn't mean it is going to be the next big thing. (Remember "channels" in IE 4 or "push content?" Ouch.)

It is ultimately up to the hordes of Web application developers and users who will decide whether or not Microformats secure their spot as a reliable and widely used bit of technology.
Users are thirsty for the additional functionality that Microformats could enable in online experiences, but developers have to bite the bullet first and implement Microformats prior to user adoption.

In order to do so, technology leaders, software developers, information architects and user experience professionals need to educate business owners and client stake-holders about the inevitable approach of the Semantic Web, the benefits of structuring data in this fashion, and how Microformats can be immediately leveraged to improve their customer/user experiences.

Improved user experiences mean happier, more loyal, and engaged customers, clients, partners and employees... And that's a fact Jack.

For more information about Microformats, the Semantic Web and groovy things like that, check out the following:

  • Microformats.org - The "official" site of the Microformat community.

  • spacenamespace - An interesting site is about annotating space with metadata, building semantic models of places, and exchanging geospatial data in RDF.

  • Magpie - a plugin for web browsers and application development framework for emerging Semantic Web tools.

  • OTN Semantic Web Beta from Oracle - A proof-of-concept Web application that demonstrates the use of RDF-based technology as the basis for a rich user experience that relies on dynamic relational navigation.

  • Alex Faaborg @ Mozilla - Alex has a great 4-part series on Microformats, UI issues, and implementation in Firefox 3.

April 07, 2008

More than Buzz -- The Semantic Web

Sir Tim Berners-Lee's, Director of the World Wide Web Consortium (W3C) has been evangelizing his vision for "The Semantic Web" for quite some time. Heck, a lot of people in the industry have been talking about it... and like most things, when people start talking, the "buzz" is created and hits the mainstream.

I typically recognize that "the buzz" has escaped the halls of tech-geekdom and entered the mainstream when my clients start asking me about the specifics of this or that. With this as my unscientific buzz-measuring tool, I can clearly see "the buzz" of "The Semantic Web" being on the rise.

As silly as it sounds, I've recently been asked by folks in meetings "How can we be more Semantic?" I guess the look to me for answers, as it is my job to answer these types of questions for them. In regards to this subject specifically, it is a really hard question to answer at this stage in the Semantic Web game. (And a rather silly question if I do say so myself).

It reminds me of a conversation almost two years ago when I heard someone say "Our online stuff is old. We need to 'Ajax' it".

Yikes!

I did make an attempt to answer them the best I could while being put on the spot. Much to their dismay, my answer really centered on "give it some time... "

So, while thoughts about The Semantic Web are fresh in my head, I figured I'd hammer out some ideas related to it, and make an attempt to demystify things the best I can. (Without writing a giant paper on the subject.) Let me preface the rest of this posting by saying that this is so, so, so high-level. (For other thoughts on technologies related to the Semantic Web, check out my post on Microformats).


What's the Semantic Web Buzz? Is it a better "Web 2.0?"


So, Semantic Web as a Buzzword: I have a hard time lumping something that is potentially so revolutionary into the "buzzword" category.

"Web 2.0" is a buzz-word. The Semantic Web is a fundamental change to how we create, consume and integrate content, of one form or another, into their online applications and into the lives of users (consumers).

The "Web 2.0" that people refer to is nothing more than a better "Web 1.0".

It took about 15 years, but the industry has finally learned that user experience really does matter, therefore existing technologies (JavaScript, CSS and DHTML) were evolved to make Web sites more engaging, interactive, real-time and "software-like".

Don't get me wrong, everything that people consider to be facets of "Web 2.0" absolutely thrill me. From much cooler and fun to use user interfaces to social content and collaboration... The innovations (and implementations) over the last several years have really made much of the Web a much better experience for users.

I admit it, I use my "Web 2.0" apps as much as anyone and find a lot of the social content stuff like Twitter, Pownce, and Digg to be essential to my daily information consuming activities.

Before I get off track here, my point is really that while we call things "Web 2.0", all we've really done is put lipstick on the Web 1.0 pig.

It's better, it's kissable... but it isn't a fundamental transformation of the way people (and computers) communicate. We've made applications better, but we haven't changed the way we think about content.

Cases in point:

  • In 1996 we had: Personal Web Pages, IRC, ICQ, PowWow, Web Rings, etc, etc
  • In 2008, we have: Blogs, Jabber, Twitter, Social Networks, etc, etc.
  • We've gotten smarter through experience, but we haven't yet really seen a revolution...

Enter "The Semantic Web"

At it's root, the term "semantic" stands for "meaning of."

The "semantic of 'X'" = The "meaning of 'X'".

With this in mind, we can say that the Semantic Web is a Web that is able to describe things (content: text, data, video, audio, etc) in a way that computer software can understand and that allows computers (software) to interpret and relate content in a specific fashion (to a specific user, or company, or subject, etc).

It is easy for people to understand the meaning of content, as our brains don't need to be programmed to do so.

We understand the meaning of things by learning. Unfortunately, software hasn't evolved to the point where AI is just "built in" and computer programs aren't yet sophisticated enough to learn solely on their own. They need help from people, and more so from the way that people tell them to behave. (That's you I'm talking about)

As an example, a few specific sentences worth of information can give a person a fairly well-rounded understanding of a topic:
  • Barack Obama is a Democrat.
  • Barack Obama is running for president in the Democratic Primary against Hillary Clinton.
  • Both Barack Obama and Hillary Clinton are US Senators.
  • The Democratic Primary winner will run against John McCain in the General Election.
  • I live in Chicago as does Barack Obama.
  • Hillary Clinton is From Chicago also, but spent most of her time in Arkansas and is now living in New York.
  • My sister lives in New York.

These sentences can easily be understood by people, but not so easily by computer systems. The theory behind the Semantic Web is to allow computers to understand information like this, and put it into context related to other information.

We can understand these sentences because we understand the syntax of the English language. All sentences are constructed with the same type of rules / syntax. The syntax of a language defines the rules used to construct meaningful statements that can be understood and put into context with other statements.

This is what the Semantic Web is all about. Defining a way that things can be described so that computers applications can understand them.

According to the W3C, The Semantic Web is about two things:
  1. It is about common formats for integration and combination of data drawn from diverse sources, where on the original Web mainly concentrated on the interchange of documents.

  2. It is about language for recording how the data relates to real world objects. That allows a person, or a machine, to start off in one database, and then move through an unending set of databases which are connected not by wires but by being about the same thing.
Also, the Semantic Web is NOT about:
  1. The Semantic Web is not about links between web pages.

  2. The Semantic Web describes the relationships between things (like A is a part of B and Y is a member of Z) and the properties of things (like size, weight, age, and price)

Let's break it down


It is a lot to take in and it is a relatively hard thing to understand unless specific examples of how semantics actually play are presented in an practical way.

While the march towards a Semantic Web is highly technical in nature (hey... someone has to build this stuff)... The real output of the effort to make this happen is all about user experience.

Computers don't care about content, people do. When all is said and done, this is about delivering better content to people. Yes, "systems" will benefit from this. But what good is a system if it doesn't provide value to end users at some point?

The first time I think the impact of the semantic Web concept became clear to me was at the end of 2004 when I stumbled across a video called "Epic 2014". I went from being confused and skeptical about the whole concept to thrilled about the possibility of a Web that could deliver on the future that Tim Berners-Lee and the W3C were proposing.

Since this time, my understanding of what a Semantic Web means have evolved, but for the sake of fun, (and a brief intermission), check out the Epic 2014 and Epic 2015 videos below. They should provide some context around the rest of this blog post.

Epic 2014 (The Original Video)




Epic 2015 - The Video Updated


"EPIC 2014" was created by Robin Sloan and Matt Thompson and based on a presentation that they gave at the Poynter Institute in the spring of 2004. The "Museum of Media History" is a fictitious organization, and as you can clearly see, the actual scenario in the video is also made up to demonstrate their point.

Sloan and Thompson were inspired to create their movies after a speech in 2003 by Martin Niselholz, CEO of New York Times Digital. While not a direct representation of Niselholz' speech, the film producers borrowed from his general concept and ran with it in their own direction.

True or not, it is quite thought stimulating and really does explore the potential long-term evolution of news aggregators like Google News and Newsbot with other Web 2.0 technologies such as blogging, social networking and user-generated content.

The second video, "EPIC 2015", takes things a little bit further and incorporates additional "Web 2.0" concepts such as podcasting, GPS and web-based mapping services.


"Mash-Ups Vs. The Semantic Web"

We've all seen Mash-Up Web sites. I am assuming that if you are reading this, than you know what a mash-up is. For the edge-case users that happen upon this posting, a "mash-up" is a term to describe a Web site that takes data from different sources and "mashes" them together to create an application.

For the sake of being lazy, I've included this link to WikiPedia that lays out the whole Mash-up concept.

So, what's different about the Semantic Web concept and the Mash-ups that we currently have?
The easiest way for me to digest this is to go back to the W3C's description of what the Semantic Web "is not" (from above):

1) The Semantic Web is NOT ABOUT LINKS between Web pages.

It is also not about how we currently see "mash-ups" being executed by Web developers. In a mash-up, we are relating information (data) to other data based on simple look-up values. We click on an item in a Web page and the Web server returns related information to the screen as that information is pulled from other data sources.

2) The Semantic Web describes the relationships between things (like A is a part of B and Y is a member of Z) and the properties of things (like size, weight, age, and price)

In your typical mash-up, we combine one or more sources of relational data to provide additional content to users. For example, we know that if a user is looking at Real Estate in Brooklyn, that we can also present a Map of Brooklyn with points plotted out on that map for each property for sale, as well as crime statistics from the NYPD.

This is the result of doing relational lookups of existing "dumb" data and presenting them to the user in a usable manner.

Under the rules of the Semantic Web, the data itself is much richer and in a format that not only describes the data itself, but the relationship between items and all of the different properties that individual data points could potentially have.

In the book "The Road to the Semantic Web", Alex Iskold explains his thinking that the core idea of the Semantic Web is to create the meta data describing data, which will enable computers to process the meaning of things. "Once computers are equipped with semantics, they will be capable of solving complex semantical optimization problems."


So, What does It Mean Now?

Let's be clear. A real realization of true semantic Web applications is quite some time away.

The description of data consumed by Semantic Web applications is done so using a concept known as RDF (Resource Description Framework). RDF has been invented and evolved by individuals with academic backgrounds in Artificial Intelligence and Logic, and as a result, is not so easy for your typical Web developer (or even experience programmer) to understand and implement. There is quite a learning curve!

It is complicated stuff, not 100% perfected, and will definitely continue to change over time until it is more fully baked.

There is somewhat of a shortcut though. While not as robust as a future vision of RDF, it is quite possible to use the very popular RSS format to build Semantic applications today. I love how we've learned to take a technology intended for one use, and expand upon it to do other things. For a tutorial on using RSS in this manner, check this lesson brought to you by Eric van der Vlist and the folks at O'Reilly.

A fully realized Semantic Web will be quite amazing indeed, but it is going to take a long time to get to the point where the technology regularly intersects with our daily lives.

It is going to take a long time to annotate the world's information and then to capture personal information in the right way in order to really make it work the way it is supposed to.

We are a few years away before we really start to see real traction in terms of Semantic Web technology.

We have to start somewhere though, and there are a variety of interesting companies out there working to be early adopters and technology "shapers" of the Semantic Web:

I'd like to wrap this one up by saying that if you have additional information, thoughts or opinions about where The Semantic Web is headed, please drop me an email with your thoughts. Also, check out this other post about Microformats and how they relate to the concept of a Semantic Web.

I'd be more than happy to incorporate them and share them with the community!