Friday 5 October 2012

Overview

Remember the old Microsoft tagline : “Where do you want to go today?” The implication is that information technology can help you achieve your goals whatever they are. Indeed, in today’s world it is often true.

Whatever your project, whether professional or personal, there are certain common issues that you are likely to encounter:

Most projects could use some additional resources

In most projects you will need to find relevant information

Most project participants would benefit from improving their expertise, which means not just finding information but learning new concepts and methods

Most projects require some analytical decisions



While existing technology is already very helpful for all these things, it can do a lot better. And the better it does, the more likely is your project to succeed.

On this website I’ve outlined some ideas about promising next steps to take in this direction:

Knowledge formalization for the masses

Summary

One of the greatest achievements of the Internet is to dramatically decrease the time it takes to find information. However, this capacity has not greatly improved over the past decade due in part to the explosion of unstructured content. The problem is even worse on the enterprise Intranets, where information is much more fragmented and lacks the interconnectivity of the Web that improves navigation and search result ranking.

The situation can be improved if content actors (producers, managers, and consumers) can combine efforts to better formalize information, making it easier to process by computers. While technologies to support such activity have been evolving, much still remains to be done. Lacking in particular are systems that assist content consumers, by far the largest segment of content actors.

Problem

Despite Apple’s Siri, we are still pretty far away from the Star Track computer, i.e., a computer that is capable of answering any question based on the available information. While content search technology has gradually improved over time, it has not been able to match the information explosion that we are experiencing (aka Big Data). In the end, you still get a (usually large) collection of documents that may or may not contain what you are looking for.

Even when information is semantically structured and we can ask a computer for a specific information object, there still remains a problem of searching across multiple structured data sets with different data models. One cannot specify query parameters if these parameters are not the same across datasets.

What’s needed is a way to formalize and merge semantic information structure :
·         Give semantic structure to content during its creation
·         Structure existing content through
o   Adding external structured properties called metadata (e.g., topic, author, date, etc)
o   Extracting structured facts from unstructured content as an alternative knowledge representation
·         Unify and interlink the resultant structure across all data sets

The result would be much easier to automatically process and search for computers, with powerful consequences for information consumers. They would then have their Star Track computer.

Unfortunately, this is a very complex undertaking that requires a lot of investment on the part of  information creators, managers, and consumers. Therefore, the supporting technology has a large challenge of boosting the ROI in order to achieve the tipping point of mass adoption.

So far, the technology has generally taken two opposing approaches :
  1. The content formalization work is carried out by dedicated trained people who are referred to as Information Architects, Content Curators, etc. These people create domain-specific data models, interconnect different models, and use them to formalize new and existing content. Some of this work can be automated to a certain degree, but automation usually introduces a significant amount of noise.
  2. The content formalization is carried out by content consumers via so-called « free tagging », whereby users can add whatever metadata to content that they wish. Free tags are just simple short phrases that add a bit of semantic structure. While content consumers can be motivated to improve semantic structure for better retrieval , the resultant degree of formalism is very weak.
There exist now sophisticated software and algorithms for experts that help to create, manage, consolidate, and reuse metadata and data models, such as linguistic rules and reference vocabularies. However, no tools are available that empower content consumers and consolidate their contributions with those made by experts.

Solution

What’s needed is a platform that allows to effectively crowdsource the content formalization task to those who use the content. After all, this formalization is done for the benefit of content users so it makes perfect sense that they should have a say in how the content is formalized. Just like one can crowdsource production of data, one can crowdsource production of data models and metadata.

Many research papers have been written on this subject (see an example), but strangely no effective commercial tools exist that would support such a process. Yet, the underlying concept is fairly simple:
  • Allow users to create, define, and modify tags, split them into properties and values, and create semantic links between tags (e.g., synonyms, translation, sub-terms).
  • Allow users to discuss and evaluate modifications proposed by others (e.g., using voting). Merge identical modification proposals and count them as votes.
  • Allow automatic acceptance based on the number of votes, user profile, etc.
  • Allow different moderation rights based on user profiles (e.g., certain users can have expert status and reject modifications made by others).
  • Assist and guide metadata creation and consolidation by using search keywords and expert-generated data models.
As research papers indicate, there are a lot of details to work out, but conceptually the main issues have already been resolved and modeled. All that’s needed is to create a viable commercial product.

Market Overview

Content is consumed through a huge number of diverse software systems. Many prominent systems already use free tagging:
  • Enterprise collaboration platforms such as Sharepoint, Drupal, and Confluence.
  • Public collaboration platforms, such as Twitter, Stack Exchange, and Delicious.
A few of these systems are starting to offer a basic level of tag management functionality. For example, in Stack Overflow users with enough reputation can specify and edit tag definitions as well as suggest and validate tag synonyms. While this is a move in the right direction, it is far from sufficient.

On the other extreme of the spectrum, Google has recently launched Knowledge Graph , which is the most complete public collection of expert-structured knowledge. This collection can be used by external services via an API, thereby providing a good basis for semi-automated enrichment of free tags.

Business model

The goal of the proposed solution is to enhance the functionality of existing content platforms, which can be accomplished in two ways:
  • Sale of a software component to content platform providers (OEM license)
  • Sale of a platform plugin to content platform users (software license or SaaS subscription)

Go-to-market strategy

Many of the platforms using free tagging provide API access and application marketplaces (see an example). The best starting strategy would be to develop tag management plugins for such platforms.

Moreover, those platforms that chose not to implement free tagging have done so in the knowledge of its limitations, and so may change their mind once a better system is in place.

Friday 28 September 2012

Edupedia

Summary

While people have been sharing more and more of their knowledge and the amount of available information has been continuously exploding, this knowledge is truly only available to those capable of understanding it. There is a strong need for complementary educational resources that would unlock the full information potential to all readers.

Edupedia is a kind of Wikipedia, where concepts are not simply described, but explained. Many complementary explanations are provided, varying in language, level (e.g., novice, expert), media (e.g., video, text, slides), mode (e.g., conceptual, by example, through a story, via a game), degree of detail (e.g., overview, short, long), and usage rights.  The explanations would come from two sources : 1) semi-automated aggregation of online educational materials, and 2) direct contributions from people as in Wikipedia.

Problem

As the saying goes, « live and learn ». Indeed, education, in the large sense of the word, transcends all our lives. Before we even embark on a project, we need to have learned enough to choose one. And once we chose it, its success clearly depends on our know-how, or in other words, on the level of our « education ». By education I don’t mean just formal schooling or training. I mean everything we have learned, both in our professional and personal lives.

There’s lots of online educational content already and the domain of e-learning has been booming. However, if one wants to understand or learn a particular concept, it’s not easy to find just the right lesson for the desired level of expertise, learning mode, and quality. Moreover, there are a lot of things to learn, and much of the educational content is not online. Yet, there is no centralized coordinated way for people to contribute new educational material.

For example, if someone reads a Wikipedia description of a mathematical derivative without already knowing what it is, they would have a very hard time understanding or learning it. Yet, there are many simple and fun ways to explain what it is and why it is interesting (the « why » part is crucial but is missing from most explanations). So if we can get a set of explanations that can be categorized and rated, then people can quickly find an explanation suitable for them.

Solution

What’s needed is a platform for 1) semi-automatically identifying and processing all of the available educational content (aggregating, interlinking, classifying, evaluating, and indexing), and 2) people to contribute new educational materials in a Wikipedia-like fashion.

The goal of Edupedia is to provide explanations, not facts. The line is quite blurry, and so it’s very important to define it as precisely as possible to constrain Edupedia’s scope. The challenge here is similar to the one faced by Wikipedia which provides detailed criteria for admission of an article into its collection.

The technology underlying Edupedia is similar to that of Freebase. Freebase was acquired by Google and is the basis for Google’s new product Knowledge Graph. Freebase aggregated open databases from the Web into a single semantic repository, where people could also contribute directly. This approach would have to be customized for Edupedia to support :
  • Education-specific data model
  • Specific functionalities such as material ranking and generation of personalized lessons
  • Specific business model (see below)
Note that Edupedia will not store content from other websites. It will only store the content metadata, such as its type, language, rating, comments, etc. These metadata will then be linked to the original content.

Here’s a rough outline of how Edupedia can be built :

Stage 1 : Bootstrapping the system with metadata
  1. Identify websites that offer free educational materials and allow users to suggest new ones. Organize them into several top-level categories for higher search precision.
  2. On the basis of the collected URLs, create several Google Custom Search Engines (CSEs) that will provide a unified access to these websites (this would be a new service in and of itself).
  3. Ingest  CSE search results into a Learning Management System (e.g., the open-source LMS provided by edX.org).
  4. Use the standard education-related vocabularies (terminologies, taxonomies, etc) to automatically organize the content.
  5. Auto-detect other parameters such as user rating, material type, and presentation length.

Stage 2 : Collaboration and refinement
  1. Allow registered users to rate content, propose classification changes, and contribute new material (new material could be as simple as a good example that explains why a given concept is of interest).
  2. Generate custom courses based on what people already know.
  3. Request and import richer metadata from content providers.
  4. Improve the auto-classification quality by using text mining and metadata alignment algorithms.
If the crowdsourcing approach worked for Wikipedia, it should work even better for Edupedia. First of all, there is a huge motivated community of teachers and students out there who would be happy to contribute. Second, it will be in the interest of content publishers to make sure that their content in Edupedia is well organized for easy access.

Note that explanations can often be contradictory, which can lead to structured debates as described in another post. Edupedia would also benefit from the free tagging approach described in another post.

Market overview

There are several platforms that aggregate educational content.  The most notable one is YouTube for Schools which gives access to “hundreds of thousands of free educational videos … from well-known organizations like Stanford, PBS and TED as well as from up-and-coming YouTube partners with millions of views, like Khan Academy, Steve Spangler Science and Numberphile.” Of course, You Tube for Schools remains extremely limited in terms of the type of material (video lectures), its scope (only from partner organizations), and the level of its organization (very basic hierarchy without search).

Another interesting service, Udemy, adopts a Wikipedia-like approach by allowing anyone to submit educational materials. It remains limited since it’s organized into courses (as opposed to explanations), does not attempt to reference the wealth of educational materials already available on the Web, and has weak search and navigation. Note that Udemy raised millions in venture capital and has sold over 2 million $ worth of courses.

Another notable attempt at aggregating educational materials was made by iSEEK Education, which is “a targeted search engine that compiles hundreds of thousands of authoritative resources from university, government, and established noncommercial providers.” iSEEK indexes not only video but also other types of materials. However, it remains very limited, its design is poor, and it seems to have stopped development.

It's certainly also important to mention Wikipedia's sister site Wikiversity. This site's goal is to assemble educational materials Wikipedia style. However, this approach is unfortunately failing: 10 randomly chosen schools (out of 60) got the grade F for failing to meet Wikiversity's own standards. Firstly, the Media Wiki platform is not the right system for managing and displaying multimedia materials that permeate online learning. Secondly, the task at hand is much more complex than a simple encyclopedia, and without a business model it is difficult to have enough resources to manage this complexity and meaningfully coordinate the community's efforts. Nevertherless, the project shows that there is a willing community of contributors that managed to create hundreds of thousands of articles in multiple languages.

Finally, the Open Educational Resource (OER) initiative is an important movement to provide standardized and open access to educational resources. OER Commons catalogues over 40 thousand OERs. OERs are well-organized accoding to usage rights, media type, level, etc. Standards, however, are notoriously difficult to put in place. OER Commons exists since 2007 yet its collection is far inferior in quantity in comparison to the portals cited above. Lack of a business model (and hence, sufficient resources) does not help either.

Business model

Similarly to Udemy, Edupedia would reference not only free public content, but also content that requires a fee to access. Edupedia would keep a percentage of the collected revenue and would also make money from targeted advertisement (similarly to YouTube).

Go-to-market strategy

As suggested in the Solution section, the first step is to create comprehensive, education-specific Google Custom Search Engines on several high-level topics (science, technology, etc), and promote them in education communities. This is the Minimal Viable Product which can later be substantially improved as outlined earlier.

Friday 14 September 2012

Community Enterprise

Summary

Wouldn't it be great to get a crowd of experts to invest a bit of time in your enterprise in an effective and meaningful way, especially when traditional investment is not available?

Community Enterprise (CE) is a platform that allows enterprises to be built and developed by a community. More concretely, it is a work crowdsourcing platform that compensates work by equity, thereby providing new means of enterprise financing.

The idea of enterprise crowdsourcing is in the air. While this goal is not easy to achieve, all the components are already there to make it a reality.


Problem


Enterprises are frequently in need of external resources to support their development. Traditionally, such resources have been supplied by various types of investors, investment funds, banks, and government grants. However, this traditional form of financing is in very limited supply, and therefore, hard to obtain. The various limitations of the traditional investment paradigm are an important bottleneck of progress in the private sector. The situation is especially difficult for startup companies, most of which fail because the necessary resources and expertise are not available.

On the other hand, there are lots of people who have some free time, money, and interest to invest into a company that they believe in. It could be just a few hours a month and a few hundred bucks, but such resources can be more than sufficient if enough people invest. For this to happen, a new conceptual, legal, and technological platform needs to be created.


Solution


"In the long history of humankind...those who learned to collaborate and improvise most effectively have prevailed - Charles Darwin"


Enterprise equity is routinely used as a means of work compensation. However, in most cases it is used as a compliment to monetary compensation. Most people cannot afford to work just for equity, unless they can do that while earning money at another job. That is where crowdsourcing comes in. Crowdsourcing is becoming an increasingly popular way of outsourcing work to a community on a task-by-task basis. This methodology would allow individuals or organizations to invest whatever resources they can spare (e.g., a few hours per month) into an enterprise in exchange for equity. Enterprise can crowdsource any type of work, be it management, development, marketing, sales, or support.

Lots of people would like to be involved in cool innovative projects. Most have to earn their living, but have a few hours a month to spare on a side project. CE allows enterprises to tap into this enormous resource that is currently inaccessible.

For this idea to succeed, a new conceptual, legal, and technological platform needs to be created.
Creation of such a platform is not a simple matter, but things are made easier by several related projects and experiences. These include:
  1. Open collaborative platforms / projects such as open source projects (e.g., Open Office), content communities (e.g., Wikipedia), or social games (e.g., eRepublik).
  2. Legal frameworks for sharing intellectual property such as open-source licenses and Creative Commons.
  3. Investment platforms for monetary exchange between individuals and groups, such as Kickstarter and FriendsClear.
  4. Task crowdsourcing platforms such as Amazon's Mechanical Turk.
The challenge is to put all these parts into a complete easy-to-use solution. As usual, the devil is in the details. The platform requires a fine balance of flexibility and simplicity as well as control and freedom in order to work. It needs to get this just right, but it is possible to achieve.

Once the CE platform is available, it would provide a rich alternative source of investment to all kinds of startup projects, for profit or not for profit. It would also be used by existing companies to raise investment from individuals or to distribute internal resources more effectively. CE would do for time investment what microfinancing did for monetary investment.

In addition to the resource aspect, crowdsourcing can add the extra dimension of "the wisdom of crowds". Finally, a CE would naturally tend to be more democratic and agile than the traditional enterprise structures.

Business model

The business model behind CE could be based on a mix of: 1) targeted advertisement, 2) license or subscription fees, and 3) transaction fees.
  1. Effectively, CE is a novel enterprise collaboration platform. Hence, its users are the perfect customers for complimentary collaboration products. Moreover, CE fosters highly distributed collaborative organizations, thereby actually creating additional market for this type of technology. CE can make money by advertising and becoming a distribution channel for these technologies.
  2. The CE platform could be licensed or offered as a service for internal collaboration within existing organizations. In this context, CE could serve to gather support for internal projects from the pool of the organization's employees, thereby improving internal resource management and cross-team collaboration. Alternatively, CE could be sold at a reduced rate to non-profit organizations as a platform for recruiting and managing volunteers. This would be a great way to increase public involvement in social causes.
  3. CE could charge its user companies a percentage of the investment that they receive through the platform. The fees could be a combination of equity and cash, depending on the situation.
Given the high interest of governments to foster entrepreneurship, the first and third source of revenue could be co-financed by governments.

It would be natural to build CE from the beginning in the spirit of what it is trying to achieve, i.e., by inviting as many members as efficient to contribute to the project. Since the right platform is not available at this time, the openness of the project from the beginning will have to be limited but can increase in phases as the platform is developed. At the beginning, a core team would have to put together the most critical aspects of the platform into a prototype that would allow the project to launch. Such a prototype could be assembled by a quick & dirty integration of already existing pieces of the puzzle.

Go-to-market strategy

Due to the novelty and importance of the concept, CE should get easy publicity from mainstream press. Moreover, CE is subject to a significant viral effect as enterprises themselves will be promoting the platform in their effort to find new investment. In addition, governments can serve as a great free marketing channel since they maintain constant communication with local enterprise communities and it is in their interest to promote CE. The above can be supplemented by the common on-line and off-line marketing channels. Such a strategy should ensure rapid user growth that will in turn lead to rapid monetization of the platform.

Thursday 13 September 2012

Real Debate

Summary

Making right decisions quickly is key to everything we do. However, any analytical decision process will generally lead to suboptimal results and take longer without an effective formal structure for assessing the situation, evidence, and argumentation.

Real Debate is a structured discussion platform that enables participants to arrive at the best choice efficiently. It allows to carry out rigorous and democratic discussions and computes the best outcome using an advanced weighted rating system.

Problem

It seems evident that for a decision that requires any significant analysis a certain degree of rigor and formal structure should be used. Just like computers are a great help during complex (or even not so complex) calculations, they can help us compute a decision based on specified parameters. Moreover, they can help us take into account analysis by multiple participants consistently and without bias.

The traditional group meeting paradigm is a subject of many jokes for a reason. Virtual discussions conducted in forums or by email suffer from similar issues as well. What’s needed is a structured discussion platform that allows to formally define the discussion objectives and constraints as well as formally evaluate the available evidence and argumentation.

Solution

The proposed platform is a discussion forum with the following additional features:

Topic definition

Discussions often last longer than necessary because of misunderstandings, which can start at the very beginning. Sometimes people argue simply because they understood the issue differently. That’s why it’s important from the start to clearly define what the topic is. Often a topic is actually a composition of several issues that should be treated one at a time.

This feature allows to define and agree upon the topic of the discussion. If there’s more than one issue or variation at hand, a simple vote will allow users to choose the topic and proceed with the discussion.

Related topics

Links can be created to sub-topics, equivalent topics (doubles), or other topics that can be relevant (these can be suggested automatically using text analytics).

Explicit hypotheses

It’s important in a structured discussion to clearly identify the possible hypotheses. A hypothesis is a possible avenue of discussion, such as an answer or an opinion. To save time, it’s important to eliminate double hypotheses. If the double is detected too late (e.g., in a large forum discussion) it’s important to be able to merge the two hypotheses if the participants agree.

Comment types

Participants can comment on any discussion statement, including other comments. Since a discussion follows the typical hierarchical tree model, a discussion statement always applies to another preceding statement (the topic, a hypothesis, or a comment). There are 3 comment types that are explicitly identified:

·        Clarification comments point out problems with the meaning of a statement or suggest an alternative formulation. A clarification comment has an acceptance property whose value indicates whether the commented statement is acceptable but can be improved as suggested, or is unacceptable unless modified (Accept/Reject). An overall rejection threshold can be set to specify the percent of participants rejecting a statement that would be required in order to make the statement invalid.

·        Argumentation comments point out problems in the logical or factual validity and relevance of a statement in relation to the preceding statement. 

·        Evidence comments propose evidence that either supports or rejects the hypothesis. An evidence comment has a required relevance property whose value indicates whether the evidence supports or contradicts the commented statement (Pro/Con). An evidence comment also has a 0-4 rating parameter which any participant can use to rate the strength of the proposed evidence. The strength of the evidence depends on two things: its estimated accuracy and its relevance to the statement. The parameter then reflects the average rating. If the rating is less than 4, participants must explain any issues they observe using an argumentation comment. Finally, an evidence comment also has the reference source parameter that must explicitly identify the source of the indicated evidence. Note that the same reference source can be used as evidence in multiple places in a discussion. Evidence comments that use the same reference source can be inter-linked, manually and automatically, in order to coordinate the evidence evaluation across multiple places of a discussion.

To avoid confusion, only a single problem or a piece of evidence can be reported in a comment whenever possible (an argumentation comment linked to an evidence rating may contain multiple problems).

Automatic computation of the relative hypothesis strength

A sophisticated configurable algorithm is used to automatically calculate the relative strengths of the presented hypotheses based on the provided comments. A weight is computed for each discussion statement that takes into account unresolved clarification and argumentation comments as well as the combined rating of the evidence comments. 

The algorithm also computes ratings for each participant which are then used to adjust the weight of their statements.

If, nevertheless, the algorithm yields an insignificant difference between two hypotheses, the decision can be taken using a simple vote.

Decision criteria

It may be that each hypothesis must be evaluated along specific axes. In such a case, the axes can be defined in a single place and then applied for all the hypotheses.

Another possibility is to constraint the discussion duration based on time or on the relative hypothesis strength (i.e., to define a margin by which one hypothesis has to win in order for the discussion to be finished).

Although the above features may seem too complex from the description, I believe that a good design can make the platform accessible to the majority of current forum users.

Market overview

Online discussion forums have existed from the start of the Web. Forums 2.0 improved on the simple message hierarchy by adding more structure and formalism. Forums such as Stack Overflow and Quora have become very popular because their content is easier to search and understand. Real Debate takes this idea further.

There exist numerous forum platforms officially dedicated to debates (many are listed in this review). However, they originate from the face-to-face debate tradition of opponents deciding the winner between two alternatives. So their definition of debate is generally too restrictive. As a result, these platforms have not been nearly as popular as the ones cited above (they are invisible on Google Trends).

There are many open-source forum platforms available that can serve as the basis for the first Real Debate MVP (Minimal Viable Product). The objective of the MVP is to demonstrate the two primary advantages of the system: 1) clarity of the debate, and 2) precision of the automated hypothesis ranking.

What about a risk that an established solution will implement missing features and put Real Debate out of business? This is highly unlikely. Real Debate goes well beyond the relatively simple Q&A forum model of Quora and Stack Overflow. They would need to release and launch an entirely new product to compete with Real Debate. If Real Debate takes traction, it would be able to corner the market before major competitors emerge, just like Quora and Stack Overflow have done in their sectors.

Business model

Real Debate comes in two flavors:
  1. A hosted forum platform, where any registered user can start a private or a public discussion
  2. A white-label forum solution, either hosted or stand-alone.
In either case, the most obvious business model seems to be « freemium ». A basic free version would be supported by advertisement, with multiple additional options available for a fee.

Go-to-market strategy

To get initial visibility, it would make sense to start with the hosted forum platform before launching the white-label solution. The hosted platform can be used to host public debates on hot topics of the day (e.g., global warming). This could quickly raise the visibility of the platform. Participants can be recruited from ongoing online discussions that mostly do not lead to any kind of consensus or conclusions.

Another way to raise the platform visibility is to promote it in communities where structured debate would bring the most obvious value and in which the audience is highly accustomed to in-depth debates. Such communities are found in a variety of domains such as law, science, medicine, journalism, history, philosophy, politics, business, and sports. Particularly noteworthy are legal proceedings, as well as scientific and philosophical debates. While the participants in these domains are well-trained in formal structured debates, they lack the tools to carry them out efficiently (I can attest to that as an ex-scientist).