Automated systems powered by new breakthroughs in Artificial Intelligence will soon begin to have an impact on the web industry. People working on the web will have to learn new design disciplines and tools to stay relevant. Based on the talk âOK Computerâ that I gave at a number of conferences in Autumn 2015.
In 1996 Intel began work on a supercomputer called ASCI Red. Over the lifetime of its development it cost ÂŁ43m (adjusted for todayâs rate), and at its peak was capable of processing 1.3 teraflops. ASCI Red was retired in 2005, the same year that the Playstation 3 was launched. The PS3 cost ÂŁ425, and its GPU was capable of processing 1.8 teraflops.
IBMâs Watson is a computer for learning. It was initially built with the aim of beating the human champion of the US game show, Jeopardy, and in 2011, it beat the human champion of Jeopardyâââby a lot. (In the picture below, Watson is the one in the middle).
Itâs hard to find development costs for Watson, but a conservative estimate would put them at ÂŁ12m over ten years. Four years after the Jeopardy victory, the Cognitoys Dino, a smart toy for children, will go on sale. It costs ÂŁ80, and is powered by Watson.
In 2012 Google used a network of 16,000 connected processors to teach a computer how to recognise photographs of cats. Three years later, the same technology can now successfully identify a photo of a âman using his laptop while his cat looks at the screenâ.
Iâve told these three stories to make a point: that cheap and plentiful processing has allowed Artificial Intelligence to improve very much, very quickly.
There are lots of different strands to A.I., but the big recent breakthroughs have been in deep learning using neural networks. Very broadly, a neural network is a series of nodes that each perform a single action, of analysis or classification, on an input. The result of the action is passed onto another set of nodes for further processing, until the network returns a final output, stating with some degree of certainty what the input is. It can return a rough answer quickly, or a more precise answer slowly. Itâs sort of how our brains work.
As an illustration of the potential of deep learning, computer scientists in Germany have used neural networks to analyse the painting styles of the Old Masters and apply them to photographs. This is not simply applying image filtersâââthe system detects sky, water, trees, buildings, and paints them according to the styles of artists such as Turner, Van Gogh, and (shown here) Munch.
And having conquered the world of art, now these A.I. systems are coming for your job.
This graph from an article on Gartner.com shows a rough approximation of the likelihood of your job being automated by learning systems in the near future. It works on two axes: routine, and structure.
To summarise it, if your job deals with abstractions and varies greatly from day to day, youâre probably safe. But if you deal with hard data, like numbers, and you do the same thing every day, itâs probably time to start nervously looking over your shoulder. As a stark illustration of this, recent figures show that the number of employees in the finance departments of major US companies has dropped by 40% since 2004.
More directly related to the Web industry, a recent study by Oxford University and Deloitte indicated that itâs ânot very likelyâ that a web design and development professional will lose their job to automation in the next twenty years. However, their definition of ânot very likelyâ is a 21% chance; to put that more bluntly, one in five of us could be out of work, due to an overall shrinking of the available job market.
There are already signs that automated systems could replace programmers in the future. MuScalpel, a system developed by University College London, can transplant code from one codebase to another without prior knowledge of either; in one test it copied the H.264 media codec between projects, taking 26 hours to do soâââa feat which took the team of human engineers 20 days. And Helium, from MIT and Adobe, can learn (without guidance) the function of code and optimise it, providing efficiencies of between 75 and 500 percent in tests.
These systems are a few years away from market, but weâre already starting to see automation move into the industry in smaller ways. Services such as DWNLD, AppMachine, and The Grid offer users a tailored mobile app or website within minutes, with styles and themes based on the content and information pulled from existing social profiles and brand assets. These services, and others like them, will become smarter and more available, skimming away a whole level of brochure sites usually offered by small digital agencies or individuals.
A common criticism of services like The Grid is that they can only produce identikit designs, with no flair or imagination. But look at the collection of websites below; people designed these, and flair and imagination are nowhere to be seen.
These screenshots are taken from Travis Gertzâ excellent article, Design Machines, in which he highlights the problem:
The work we produce is repeatable and predictable. It panders to a common denominator. We build buckets and templates to hold every kind of content, then move on to the next component of the system.
Digital design is a human assembly line.
Looking back at the Gartner chart on the likelihood of automation, Iâd say that âa human assembly lineâ would be somewhere near the bottom left. And weâve only ourselves to blame. Gertz again:
While weâve been streamlining our processes and perfecting our machine-like assembly techniques, others have been watching closely and assembling their own machines.
Weâve designed ourselves right into an environment ripe for automation.
All of the workflows weâve built, the component libraries, the processes and frameworks weâve made⊠they make us more efficient, but they make us more automatable.
However, brilliant writer that he is, Gertz doesnât only identify the problem; he also offers a solution. And that solution is:
Humans are unpredictable mushy bags of irrationality and emotion.
This is a good thing, because a computer can never be this; it can never make judgements of taste or intuition. Many people are familiar with the Turing test, where a human operator has to decide if theyâre talking to another human, or a bot. But thereâs a lesser-known test, the Lovelace test, which sets creativity as the benchmark of human intelligence. To pass Lovelace, an artificial agent must create an original programâââsuch as music, or a poemâââthat it was never engineered to produce. Further, that program must be reproducible, and impossible to explain by the agentâs original creator.
The idea is that Lovelace should be impossible for an artificial agent to pass. Creativity should be impossible for a computer. And itâs this, not tools, that offers us the opportunity to make our roles safe from automation.
Andrew Ng, who helped developed Googleâs deep learning systems and now works at Chinese search company Baidu, has serious concerns that automation is going to be responsible for many job losses in the future, and that the best course of action is to teach people to be unlike computers:
We need to enable a lot of people to do non-routine, non-repetitive tasks. Teaching innovation and creativity could be one way to get there.
But as well as learning to be creative, we should also become centaursâââthat is, learn to enhance our abilities by combining our instincts with an intelligent use of artificial intelligence. Many smart people have begun considering the implications of this; Cennydd Bowles wrote:
A.I. is becoming a cornerstone of user experience. This is going to be interesting (read: difficult) for designers.
To the current list of design disciplines we already performâââvisual, interaction, service, motion, emotion, experienceâââwe will need to add one more: intelligence design.
Previously I said that A.I. is improving very quickly, and this also means that itâs becoming much more available very quickly. Services that were once only available to the internet giants are now available to everyone through APIs and products, at reasonable-to-free prices.
Remember Watson? All of its power is available to use through IBMâs Developer Cloud; their Bluemix cloud platform and a Node SDK gives you access to powerful and sophisticated image and text services via some RESTful APIs. IBM wants Watson to be the ubiquitous platform for AI, as Windows was to the home PC and Android is to mobile; as a result, Developer Cloud is free for developers, and reasonably priced for businesses.
What do you get with Watson? For a start, some visual recognition tools, like the Google one I mentioned at the beginning of this piece. Upload an image, and Watson will make an educated guess at explaining the content of that image. It works well in most cases (although was bizarrely convinced that a portrait of me contained images of wrestling).
These identification errors should reduce in frequency as you give Watson more data, because deep learning thrives on training and data. Thatâs why all the major online photo tools, from Google Photos to Flickr, entice you with huge storage limits, in many cases practically unlimited; they want you to upload more, because it makes their services better for everyone. These services include automatic tagging of photos and content-based search; Google Photos in particular is very good at this, easily finding pictures of animals, places, or even abstract concepts like art.
Eventually these offerings will raise the expectations of users; if your photo service doesnât offer smart search, itâs going to seem very dumb by comparison.
Watson can also find themes in batches of photos, offering insights into users interests and allowing for better targeting. This is another reason why photo services want you in: because you become more attractive to advertisers.
I should add that Watson is not the only game in town for image recognition; alternatives include startups like Clarifai, MetaMind, and SkyMind; and Microsoftâs Project Oxford, which powers their virtual assistant Cortana. Project Oxford has best-in-class face APIs, able to detect, recognise, verify, deduce age, and find similar faces; how you feel about that will largely depend on your level of trust in Microsoft.
While image recognition is interesting and useful, the âkiller appâ of AI is natural language understanding. The ability to comprehend conversational language is so useful that every major platform is adopting it; Spotlight in OSX El Capitan allows you to search for âdocuments I worked on last weekâ, while asking Google âhow long does it take to drive from here to the capital of France?â returns directions to Paris.
If you want to add natural language understanding to your own apps, one of the best tools around is the Alchemy API, originally a startup but now part of the Watson suite. This offers sentiment analysis, entity and keyword extraction, concept tagging, and much more.
Natural language understanding is a key component in a new wave of recommendation engines, such as those used in Pocket and Apple News. Existing recommendation engines tend to use âneighbourhood modellingâ, basing recommendations on social graph interaction; but the new AI-powered engines understand the concepts contained in text content, allowing it to be better matched with other, similar content.
Where AI really excels, however, is when applied to conversation. Talking to a computer is nothing new; to give just one example, IKEA have had a customer service chatbot called Anna on their website since at least 2008. But although Anna can answer a straightforward question, she has no memory; if you donât provide the same information in a follow-up than you did in the previous question, youâll get a different answer. This isnât really a conversation, which has requirements as defined here by Kyle Dent:
A conversation is a sequence of turns where each utterance follows from whatâs already been said and is relevant to the overall interaction. Dialog systems must maintain a context over several turns.
Maintained context is what todayâs AI conversations offer that was previously missing. Google are using this to trial automated support bots, trained on thousands of previously recorded support calls. (The same bots, trained on old movies, can be surprisingly philosophical.) And itâs what powers the new wave of virtual assistants: Cortana, Siri, Googleâs voice search, and Amazon Echo. This latter is particularly interesting because it lives in your home, not your phone, car or TV; itâs the first move into this space, soon to be joined by robots with personality, like Jibo, or Mycroft.
All of these virtual assistants share another feature, which is that you can interact with them by voice. Voice isnât essential to conversation, but it helps; its much easier than using a keyboard in most cases, especially in countries like China which have very complicated character input, and high levels of rural illiteracy.
Making computers recognise words has been possible for a surprisingly long time; Bell Labs came up with a voice recognition system back in 1952, although it could only recognise numbers, and only spoken by a specific person. There was a further breakthrough in the 1980s with the âhidden Markov modelâ, but deep learning has hugely improved voice recognition in the past three years; Google says its voice detection error rate has dropped from around 25% (one misidentified word in every four) in 2012 to 8% (one in twelve) earlier this yearâââand thatâs improving all the time. Baidu says their error rate is 6% (one in seventeen), and they handle some 500 million voice searches every day.
Voice recognition is available in some browsers, namely Chrome and Firefox, through the Web Speech API, and many other products including Watson and Project Oxford offer speech-to-text services. These all require a microphone input, of course, which unfortunately rules out using Safari or any iOS browser at all.
But while voice recognition can identify the individual words in an utterance, it doesnât understand in any way the meaning of those words, the intent behind them. Thatâs where the previously-mentioned breakthroughs in natural language understanding come in. There are a growing number of voice-based language understanding products now available, including Project Oxfordâs LUIS (Language Understanding Intelligence ServiceâââI love a good acronym) and the startup api.ai. The market leader in parsing complex sentences is Houndify, from SoundHound, but the service I like for its ease of use is Wit.
Wit was once a startup but is now owned by Facebook. Itâs free to use although all of your data belongs to Facebook (which may be a deal breaker for some) and is available to every other user of the serviceâââbecause, as I said earlier, more data gives deep learning systems more power. It has SDKs for multiple platforms, but where it wins for me is in its training system, which makes it very easy to create an intent framework and correct misinterpreted words.
Wit is the power behind M, Facebookâs entry into the virtual assistant market. M is notable because it only lives inside Facebook Messenger, which is a pattern Iâm sure weâre going to see much more in the future: the AI-powered shift from the graphical user interface to the conversational; from GUI to CUI.
Thereâs a reason that Facebook paid an estimated ÂŁ15 billion for WhatsApp, and itâs not solely their ÂŁ6.5 billion in sales: itâs because messaging apps are huge, and WhatsApp is the biggest of all, with some 900 million monthly active users. Whatâs more, messaging apps are becoming even more huge really quickly; theyâre the fastest growing online behaviour within the last five years, and an estimated 1.1 billion new users are set to come on board in the next three years.
And messaging apps as we know them in the West are actually very limited compared to messaging apps in Asia, especially China, where they are more like platforms than apps: you can shop, bank, book a doctors appointment⊠basically, anything you can do on the web today. Messaging apps are a huge growth area, and theyâre going to be powered by conversational assistants.
We can see this beginning already with apps like chatShopper (currently Germany only) which lets you talk to a personal shopper to make purchases through WhatsApp; and Digit, a savings app that communicates almost entirely by text message. These currently use a mix of automatic and human operators (this is also how Facebookâs M works right now), but as AI becomes more intelligent the bots will take over from the humans in many cases.
More advanced, fully automated services include x.aiâs âAmyâ, or the apparently very similar Clara. These are meeting assistants that work by email; you ask them to find a suitable time and place for a meeting, and they communicate with all the participants until the arrangements are made, then email you back with the final details.
Conversational UI is an idea thatâs time has come, enabled only now by AI and natural language understanding. To add it to your own apps you could look at Watsonâs Dialog service (a similar service is also apparently coming to Wit in the near future), or a startup such as re:infer. But itâs not only a case of plumbing in a service, it will also require an addition to the list of design disciplines I mentioned previously: conversation design.
I should note that new interaction models are still prone to old problems; security, privacy and trust should always be paramount in your applications. Remember the Samsung TV scandal earlier this year? Do we really want a repeat of this but with an artificially intelligent Barbie in childrenâs bedrooms?
The ready availability of deep learning services has come upon us so quickly that weâve barely realised; many of the services Iâve mentioned didnât exist even 18 months ago. This is a little bit scary, and a huge opportunity. Thereâs little doubt that AIâs going to take routine jobs from the web industry; so as AI improves, we need to improve with it. The way to do that is to harness AI for our own use, and apply creative, irrational, human thinking to it.
Cross-posted to Medium.