Introducing Schema.org

Google, Bing, and Yahoo! have announced the formation of schema.org. The current schema.org type hierarchy can be found on the organization’s website here. OpenCyc concepts corresponding to this hierarchy may be explored at sw.opencyc.org. See for example the entry for movies.

Smartest Machine on Earth

Tune in Wednesday February 9, 2011 for the premier of Smartest Machine on Earth. The NOVA documentary explores IBM’s Watson, the computer that may do for Jeopardy! what Deep Blue did for the game of chess. The program airs 10pm EST on your local PBS station.

A Progress Report

We’re pleased to bring you an update on several recent activities related to OpenCyc and the Semantic Web.

UMBEL
The lightweight UMBEL ontology is finally live. Mike Bergman and Fred Giasson deserve a big round of applause for the tremendous effort they’ve put into this release. They’ve meticulously selected 20,000 of the most relevant concepts from the more than 300,000 in the Cyc KB. What’s more, relationships between these concepts have been simplified to facilitate discovery of related concepts and alignment with external ontologies.

One can use UMBEL to describe things, to help develop new ontologies and to put individuals in context. As an example of the first (from the UMBEL documentation), suppose you wanted to describe Muhammad Ali. Using the FOAF ontology, you could only say that he is a foaf:Person. We know he is, more specifically, a boxer, but we can’t find a boxer ontology. UMBEL has the subject concept sc:Boxer, which comes from the OpenCyc ontology. Using UMBEL, we can use both of these ontologies at once, thus employing the properties of foaf:Person (name, gender, birthday) as well as the class hierarchy above sc:Boxer (cyc:Person, cyc:SocialBeing, cyc:Athlete) to create a detailed representation of Muhammad Ali that relates to all other ontologies mapped into UMBEL.

Wikipedia and OpenCyc Alignment
We collaborated with Olena Medelyan and Catherine Legg of the University of Waikato, New Zealand in their effort to automatically identify ontologically equivalent concepts in Cyc and Wikipedia. The work was presented at this year’s AAAI and represents the highest quality mapping to date. It will prove useful for connecting with other open datasets like DBPedia and Freebase.

Details and downloadable versions of the mappings can be found at the project website.

OpenCyc Update
An updated OpenCyc ontology has been released featuring:

  • a new Creative Commons licensing
  • new simplified relations between concepts in addition to the existing relationships
  • new URI’s complying with the latest Linked Data principles
  • a cleaner internal structure free of legacy concepts related to internal Cycorp projects that were of little use to the general community

This OpenCyc update represents a significant usability improvement over the previous ontology and incorporates feedback from Cyc Foundation members, the UMBEL effort, and the community at large.

Thank you all for your continued interest and support.

Games With a Purpose

Luis von Ahn and his team have launched gwap.com, a collection of fun games which also capture machine readable knowledge. 

UMBEL

Mike Bergman began an interesting series about the history and motivations of the UMBEL  project over the weekend. Stay tuned.

New Cyc Foundation Facebook Group

The Cyc Foundation has a new Facebook group. If you are a member, please show your support and invite a friend!

TreeJuxtaposer

Also on the topic of ontology alignment, TreeJuxtaposer is a wonderful comparison tool by Tamara Munzner.

You Say - We Say

You Say - We Say is an interesting visualization of folksonomy/ontology alignment.

Batch importing into Cyc

How do you efficiently import a bunch of terms and assertions into an OpenCyc instance? Recently I looked at a few approaches. I found that Cyc’s “KEText” file format works well. Here are the docs on this format: KEText. The following SubL command then comes in handy, to load a file in KEText format:

(load-ke-text-file #$CycAdministrator “C:/my.ketext.txt” :agenda t)

To load very large amounts of data you’ll need to break your KEText files up into fairly small chunks, as there are limits to how much data the SubL command can process.

You could alternatively use the OpenCyc Java API, which provides everything you need to make additions and changes to a Cyc image. In my case I had a bunch of information in XML already, so transforming the XML into KEText format was an easier way to go.

A Bridge Between Rich Semantic Reasoning and Theorem Provers

In the history of the Cyc project, Cyc’s knowledge base and inference engine have evolved in a direction far different from most other automated theorem provers. Cyc concentrated on solving problems in very large knowledge spaces (i.e., millions of facts), using higher-order logic, although the problem solutions were often not very deep. The automated theorem proving community, on the other hand, looked at relatively small knowledge spaces (or theorems), but focused on becoming very fast at finding very deep solutions.

To date, there has been fairly little cross-pollination between the two communities. In part, this has been because there was no corpus of problems accessible by both Cyc and automated theorem provers. Now, however, just such a problem corpus has been released and made available in the TPTP (Thousands of Problems for Theorem Provers) format that is the standard for automated theorem proving researchers. More information about this problem suite can be found at
http://www.opencyc.org/doc/tptp_challenge_problem_set.html.