Tuesday, 18 June 2013

Rise of Linguistics – IBM, Google & Microsoft in $100 billion+ Race

Big Data and Analytics dominate column inches, blogs, industry events, office conversations, business strategies, and the like, but Tech Giants IBM, Microsoft and Google are already busy uncovering the next big gem in IT. Step forward Linguistics.

Linguistics is a relatively young scientific study that seeks to understand the true building blocks of human language – something that has eluded even the most respected evolutionary biologists of today. Some consider it almost impossible to explain the origins of human language but that has not stopped the budding Linguistics community asking a series of searching questions. How and where did language originate? How did we come to understand language? What is the role of the brain? What are the intricate rules of language?   

The Computer Science discipline attempts to consider all of this in the context of computer consumption – no mean feat for such an elusive specialism. To be a little more specific, that is to reverse engineer human language so that computers can understand, act upon and respond to human intent. Genius. Long the realm of science fiction, the two-way converse between man and intelligent machines is fast moving towards science fact. Artificial Intelligence is on its way, and boy, is it going to be worth the wait. 

Natural Language Processing Edges Closer

The Linguistics field of study, often referred to as natural language processing (NLP) or natural language understanding (NLU) in computer science, is rapidly evolving after decades spent in R&D labs, at the cost of a small fortune. Whilst it is true NLP exists in part today, it is exactly that. “Approximations” best describe current state capabilities, a combination of statistical analysis, rule-based methods and heuristics (a type of shortcut technique), helping arrive at estimated results. Mathematics not Science, if you will. 

The problem being it fundamentally lacks “intelligence” and therefore is not fit for purpose. At least not mainstream consumption anyway. Current methods fail to grasp the true “meaning” of language – a pre-requisite for Artificial Intelligence. To do so, you must understand grammar (structure), morphology (formation of words), syntax (formation of sentences from words) and most importantly of all, semantics (the relationships between words and sentences that forms “meaning”).

And if you didn’t have to read that twice, you are doing well. Very well. Add in the complexities of language ambiguity, idiosyncrasy and sheer global variety and you’d be forgiven if your head starts spinning. Further consider that exact details are not explicitly coded in the language we use – in fact, much of our understanding is formed from our knowledge of the real world, learned through time – and you quickly come to realise that language is only part of the problem. The point being, this is an immensely complex subject domain, offering some explanation as to why this field remains very much in R&D mode.

Nonetheless, NLP is an area moving forward with speed as key players look beyond the traditional statistical modeling approach and look towards natural science for answers; including biology, anthropology, psychology and neuroscience amongst others. The complex understanding of linguistics, gained through decades of interdisciplinary scientific study, is about to ensure Natural Language Processing gets a serious facelift.

Race For Glory

When the Tech Giants ramp up efforts, you can be sure the space is hotting up, each trying to reach the sea of gold that awaits. And that’s exactly what the likes of IBM, Microsoft, and now Google, have been busy doing. IBM and Microsoft have long been doing battle in the labs, whilst Google, a relative newcomer to the party, have also joined the race. 

So, what of the challengers?

Well, IBM leads the way with their creation of “Watson”, the quiz-winning supercomputer that beats humans at their own game. Using a unique combination of natural language interpretation, cognitive-style learning and hypothesis-based decisioning Watson has the potential to define a revolutionary new service category – “experts-as-a-service” in highly specialist domains where decision-making is vital.

For example, Watson is currently “in training” as a medical practitioner, with the end-goal of providing physicians with decision-support in the patient diagnosis phase. Watson will help to understand symptoms through the mass analysis of medical research data and application of patient history.

Whilst IBM could be considered most advanced in terms of an end-to-end decision-support platform, other players are focusing on key components in the decision chain, such as NLP in its entirety. That can be said for Microsoft who, not to be outdone, recently wowed a Chinese audience with a demo of real-time speech translation based on its own NLP developments. The Language Translation market alone is projected to reach $100 billion by 2020, signaling the immense potential of NLP technology, especially when you consider this is merely one application of many.

And if IBM and Microsoft thought they were the only two players in town, then Google will no doubt have something to say. The recent high-profile hire of Artificial Intelligence thought-leader Ray Kurzweil to head up Google’s Natural Language Processing Group is a sure fire statement of intent, as is the follow-up acquisition of summarisation vendor Wavii, perhaps a sign of further things to come.

Yet that only tells half of the story, at least if you emanate from Russia. In what appears to be a classic tale of Russia vs America in yet another race for glory, the Russians boast a historic association with linguistics research dating way back to the 1950s.

And it doesn’t end there. Niche Russian software vendor ABBYY, better known for its market-leading OCR and data capture capabilities, is understood to have made significant progress on what it calls a “universal linguistic platform”, claiming a first-of-its-kind, having been in stealth mode since its founding days, some 18 years previous. That’s a rather large research project. Information remains at a premium with the company rumoured to be gearing up for launch.

In my next article I will focus more specifically on the coming impact of Linguistics, including use cases of emerging semantic technologies and how these applications will completely redefine the computing landscape. Stay tuned. 

No comments:

Post a Comment