Collaborative Filtering and Social Capital

November 21st, 2008 | Peter Ferne, Jiva Technology

Recently I have been thinking about how we can apply the techniques of collaborative filtering to people search and recommendation. Indeed we have built Beanbag Learning as a research testbed to allow us to explore possible approaches in this space. There are a number of threads which are starting to come together: collaborative filtering, the social graph, peer to peer, Whuffie and solar magnitude, openness and mutability. I'll outline them below and try to link to some of the wide range of conversations going on which link these seemingly disparate areas.

Collaborative Filtering

Research into collaborative filtering and recommender systems has been given a huge boost by the Netflix Prize and by the availability of datasets and web APIs from people such as Last.fm, del.icio.us and MovieLens.

I attended this year's Recommender Systems conference in Lausanne, RecSys 2008, which was kicked off by a great overview of recent progress in the the field by Yehuda Koren, part of the BellKor team leading the pack in the Netflix prize, who has recently left AT&T Research for Yahoo.

Although most of the talks were presentations of academic papers there was also a strong industry presence from both bigger players like BT, Yahoo and Last.fm and from startups such as ourselves. Strands, who offer their recommender system as a web service, even ran a $100k prize competition for recommender startups, won by Gravity R&D (although my vote would have gone to Iletken).

Interesting and useful as the conference was it focused on people recommending things, usually movies, music or reading matter. What lifts people search above a directory listing is recommendations of people by other people.

The Social Graph

The whole notion of mapping and opening up the social graph is very much in the air at the moment. To mention only a few highlights, in the last year or so we have had: Brad Fitzpatrick and David Recordon's Thoughts on the Social Graph, Six Apart Opening the Social Graph, the launch of the Social Graph API by Google, a Foo Camp and two conferences, Graphing Social Patterns West and East, from O'Reilly.

Of course the roots of this are deep, reaching back at least as far as the Semantic Web FOAF project in 2000 and XFN in 2003.

You may have seen the recent VentureBeat coverage of Aardvark.im, still in closed beta at the time of writing, described as Yahoo Answers meets Twitter -- but better. This is particularly interesting for me as it's quite close to what we are doing with HTH.

Buzz Andersen did a nice post last week on Ambient Recommendation which put me in mind of something I had read in Ben Russell's seminal Headmap Manifesto a few years back which also envisaged utilizing the social graph within the context of location aware devices:

..show me all the restaurants my friends like within a mile radius.

... the information you are receiving is not just a broadcast that lists the nearest Starbucks and McDonalds, but information based on your personal profile and the suggestions and opinions of your peers.

Peer to Peer

At the O'Reilly P2P Conference in Washington in 2001 I was fortunate enough to attend a BoF ('Birds of a Feather' session) on NeuroGrid by Sam Joseph, then at the University of Tokyo, now at the University of Hawaii. His simple but powerful notion of Semantic Query Routing struck me then as brilliant and is something which has stayed with me, lurking in the nether regions of my brain, ever since.

Abstract. NeuroGrid is an adaptive decentralized search system. NeuroGrid nodes support distributed search through semantic routing (forwarding of queries based on content), and a learning mechanism that dynamically adjusts metadata describing the contents of nodes and the files that make up those contents.

The idea is based on the observation that this is how we typically handle queries amongst our social circle. If I want to know about Glitch Hop or Processing I will ask Stefan, but if I want to know about World of Warcraft I'll ask Alice. I won't ask everybody I know indiscriminately, except perhaps as a last resort.

Sam's work, and the work of people like Roger Dingledine and Zooko on reputation and accountability, for example in the O'Reilly Peer-to-Peer book and at the first Emerging Technology conference the following year, was developed in the context of ensuring equitable access to resources in p2p file storage and sharing networks but has wider relevance today.

Whuffie and Solar Magnitude

Of course it was around this time that Cory Doctorow published Down and Out in the Magic Kingdom and coined the term Whuffie, recently given currency by Tara Hunt whose book The Whuffie Factor is available for pre-order.

He had a lot of left-handed Whuffie; respect garnered from people who shared very few of my opinions. I expected that. What I didn’t expect was that his weighted Whuffie score, the one that lent extra credence to the rankings of people I respected, was also high — higher than my own. I regretted my nonlinear behavior even more. Respect from [Tim] would carry a lot of weight in every camp that mattered.

When I reread this passage recently it put me in mind of the idea of the Solar Magnitude Forum put forward by Juliette Melton and Philip Greenspun.

You look up at the night sky with its infinitude of stars (like the 3.5 million discussion forum posts at photo.net). What objects do you see? Those that are either very close (Earth's moon) or those that are very bright (a supernova in a galaxy far far away). This is how the Solar Magnitude Forum should work.

Although that was written in the context of moving beyond the limitations of the typical threaded discussion forum it is fundamentally about how we allocate attention. The concept of combining some abstract notion of 'brightness' with one of 'distance' is an important and valuable one also captured by Cory's idea of left handed and weighted Whuffie.

In order to be widely adopted and immediately accessible it is almost certainly necessary to be able to express social capital as a single number, a Celebdaq style personal stock price, as suggested half-jokingly by Jeff Ward in his recent post The Whuffie Index. However it is important to remember two things it has in common with traditional stock prices for companies. Firstly it is a potentially volatile number and its history conveys a huge amount of information, not least implications of the future -- whose stock is rising or falling for example. And secondly that it contains within in a great deal of complexity. You can drill down behind the strike price quoted for YHOO or GOOG to see market capitalisation, sales, profit and loss and even examine filed accounts. A similar level of complexity must underlie any usable notion of Whuffie. In a simple but indicative way Twinfluence does a great job of making the underlying complexity evident and accessible. Unlike Twitterank which fails badly in this regard.

Jeff's scoring algorithm may not have been proposed seriously and is clearly far too crude to be useful as it stands but it does capture one essential notion: that any measure of social capital (Whuffie or whatever we call it) needs to be based on your activity over time in a wide range of different spheres, including both explicit ratings pulled from services such as Twinfluence, Intense Debate, Digg or wherever; and implicit factors derived from your activity on FriendFeed, Blip.fm, Dopplr etc.

Open and Mutable

In order for a reputation system to itself be trustworthy it must exhibit a significant degree of openness. As hinted at by Hans Granqvist in his recent post a usable web reputation service needs to build on top of established services such as OpenID and OAuth and to be freely and widely available. It needs the kind of approach favoured by the DiSo project (and not XRIs).

I also believe, and I haven't yet seen any discussion of this, that the algorithms at the heart of a sustainable open social capital infrastructure must themselves be able to mutate and evolve in response to their environment. Certainly as new social software networks rise and fall their contributions to calculations of social capital need to follow a similar trajectory. But just as important is the need to evolve in the face of changes to the way that the very notion of social capital is perceived, used and abused.

Pulling all of these threads together is going to be a huge undertaking, and it will be a long journey, but it will be immensely rewarding.

View the original post, with comments, at http://blog.scrump.com/2008/11/21/collaborative-filtering-social-capital/.

  • Beanbag Learning
  • Bettr
  • Jiva Technology