We’ve moved to http://www.haleyAI.com

April 16, 2008

The Semantic Arms Race: Facebook vs. Google

As I discussed in Over $100m in 12 months backs natural language for the semantic web, Radar Networks’ Twine is one of the more interesting semantic web startups.  Their founder, Nova Spivak, is funded by Vulcan and others to provide “interest-driven [social] networking”.  I’ve been participating in the beta program at modest bandwidth for a while.  Generally, Nova’s statements about where they are and where they are going are fully supported by what I have experienced.  There are obvious weaknesses that they are improving.  Overall, the strategy of gradually bootstrapping functionality and content by controlling the ramp up in users from a clearly alpha stage implementation to what is still not quite beta (in my view) seems perfect. 

Recently, Nova recorded a few minute video in which he makes three short-term predictions:More...

  1. Yahoo’s indexing of RDF will start the Semantic Web 3.0 arms race involving Google and Microsoft.
  2. The web will transition from pages to linked data. 
  3. Facebook “has to compete” with Google.

Nova was a little on the spot in the video.  Personally, I liked his “the web becomes a database” comment more than the Berners-Lee reiteration of linked data.  The notion of the entire web being a database is the right perspective on the semantic web (i.e., RDF), in my view.  Linked data is boring (try the Tabulator if linked data excites you.)  The action (and opportunity) is doing something with it!  When asked about ten years out, Nova displayed more of his deep insight and vision, however.  (See below.)  The truth is, beyond his first one, Nova was a little on the spot.  (See for yourself in the video.)

I love the pithy #3 that he decided to throw in there.  He did not invent that on the spot but found his legs just before being asked about longer term vision.   It makes sense, of course.  Google’s attacking with Open Social (so is the rest of the world including all the bookmarkers and even Nova’s Twine).  Facebook has to shift direction and the only target big enough given its size is search and advertising.

In his longer term vision he mentions the intelligent web that reasons and helps make decisions.  

This is where the battleground is for artificial intelligence and Semantic Web 4.0 (his term for the 4th decade of the web starting circa 2020).

Personally, I think natural language should have been in his first three.  Powerset will demonstrate that and all the action around Reuter/Clearforest/Calais (which he mentions and expects Google to compete with) indicate that natural language is critical to populating the semantic web (of course we have the database approach of DBpedia and Freebase, too).  In general, people are not going tag sentences or paragraphs.  Machines will.  The only RDF people are going to add are meta-tags at the page level for search engine optimization given Yahoo’s move (and the expected response from Google that Nova mentions.)

Certainly, natural language understanding is a prerequisite for the Semantic Web 4.0.  We will be talking more and typing less long before then.

Learning from the Future with Nova Spivack from Maarten on Vimeo.

April 3, 2008

Cyc is more than encyclopedic

I had the pleasure of visiting with some fine folks at Cycorp in Austin, Texas recently.  Cycorp is interesting for many reasons, but chiefly because they have expended more effort developing a deeper model of common world knowledge than any other group on the planet.  They are different from current semantic web startups.  Unlike Metaweb‘s Freebase, for example, Cycorp is defining the common sense logic of the world, not just populating databases (which is an unjust simplification of what Freebase is doing, but is proportionally fair when comparing their ontological schemata to Cyc’s knowledge).  Not only does Cyc have the largest and most practical ontology on earth, they have almost incomprehensible numbers of formulas[1]  describing the world.   (more…)

March 28, 2008

Harvesting business rules from the IRS

Does your business have logic that is more or less complicated than filing your taxes?

Most business logic is at least as complicated.  But most business rule metaphors are not up to expressing tax regulations in a simple manner.  Nonetheless, the tax regulations are full of great training material for learning how to analyze and capture business rules.

For example, consider the earned income credit (EIC) for federal income tax purposes in the United States.  This tutorial uses the guide for 2003, which is available here. There is also a cheat sheet that attempts to simplify the matter, available here. (Or click on the pictures.)

eitc-publication-596-fy-2003.jpgeitc-eligibility-checklist-for-tax-year-2003.jpg

What you will see here is typical of what business analysts do to clarify business requirements, policies, and logic.  Nothing here is specific to rule-based programming.  (more…)

March 11, 2008

Over $100m in 12 months backs natural language for the semantic web

Radar Networks is accelerating down the path towards the world’s largest body of knowledge about what people care about using Twine to organize their bookmarks.  Unlike social bookmarking sites, Twine uses natural language processing technology to read and categorize people’s bookmarks in a substantial ontology.  Using this ontology, Twine not only organizes their bookmarks intelligently but also facilitates social networking and collaborative filtering that result in more relevant suggestions of others’ bookmarks than other social bookmarking sites can provide.

Twine should rapidly eclipse social bookmarking sites, like Digg and Redditt.  This is no small feat!

The underlying capabilities of Twine present Radar Networks with many other opportunities, too.  Twine could spider out from bookmarks and become a general competitor to Google, as Powerset hopes to become.  Twine could become the semantic web’s Wikipedia, to which Metaweb’s Freebase aspires. (more…)

March 3, 2008

Oracle should teach Siebel CRM about location and money

Not long ago I posted on the need to understand common concepts well. My example then concerned the need to understand time well enough to answer a question like, “How much did IBM’s earnings change last quarter?”. Recently, in contemplating some training issues related to the integration of Haley Authority within Siebel, I came across examples phrasings from the documentation on Siebel’s web site, including:

  • if an account’s location contains “CA” then add 50000 in “USD” for the account
  • if an account’s location contains “CA” then add 70000 in “USD” on today for the account

Two things are immediately obvious.

  1. Oracle does not understand location.
  2. Oracle has an interesting, but nonetheless poor understanding of money.

Of course, I am intimately familiar with Authority’s understanding of money. However, Siebel needs more than Authority understands. (more…)

February 19, 2008

Understanding events and processes takes time

We have been teaching a computer to answer questions like, “How much did IBM’s earnings change last quarter?”  It takes a fair bit of knowledge, including how to understand English, to answer this question.  But teaching it what a “quarter” is brought back memories of debates with some former CMU colleagues about what units are and how to model time.  Since quite a few people ask me for help with knowledge engineering and ontological matters, I thought some might be interested in parts of those debates.As you will see, a strong upper ontology of common knowledge is required to understand common business knowledge.  Leveraging such an ontology is the only way to deliver business rules for under $50.

Sentences like “do something if more than a number of possibly related things have happened within a timeframe of something else happening” or “do something if nothing happens within a timeframe following something happening” are extremely common in business process management (BPM), complex event processing (CEP), and workflow.  With a sense of time, a business rules management system (BRMS) can support BPM, CEP, and workflow applications almost trivially.  Without a sense of time, most BRMS force users to perform computations.  

For example, without a sense of time and an infrastructure that supports it, the sentence “call a customer if no response is received within 30 days of notifying the customer of a delinquency” has to be transformed into something like “if a notice is mailed on a date and the notice is a delinquency and the date of notification has a day number then compute the date for checking by adding 30 to the day number and check for a response to the delinquency notice on the date for checking”.  The checking on a date for a response to a notice must also be implemented as a database (or persistent queue) of events to be polled or triggered by application code.  Then a second rule is required to implement the check, as in “if checking whether a response has been received to a notice and the notice was given on a date of notice and the notice was given to a customer and there exists no record of communication with the customer since the date of notice then call the customer”.  (Note that this is actually how most BRMS products would implement this.  The natural language approach I prefer handles the original sentence.)

The discussion here reflects the general structure and content that a usable ontology for business process management requires.  Most users of business rules management tools will find the need to understand and engineer this discussion in their tool of choice.  As my Haley Systems customers know, much of this is reflected in Authority’s built-in ontology and English vocabulary, but quite a few of the points discussed here reflect improvements, especially concerning the confusion between units and amounts.

As you will see the discussion takes careful thinking.  Some readers may find it onerous.  If at any time you have had enough (or if you simply cannot take anymore!), please skip to the end and decide whether to fill in the conclusions by revisiting the body.

(more…)

Blog at WordPress.com.