OK Computer: how to work with automation and AI on the web

Warning This article was written over six months ago, and may contain outdated information.

Automated systems powered by new breakthroughs in Artificial Intelligence will soon begin to have an impact on the web industry. People working on the web will have to learn new design disciplines and tools to stay relevant. Based on the talk “OK Computer” that I gave at a number of conferences in Autumn 2015.

In 1996 Intel began work on a supercomputer called ASCI Red. Over the lifetime of its development it cost £43m (adjusted for today’s rate), and at its peak was capable of processing 1.3 teraflops. ASCI Red was retired in 2005, the same year that the Playstation 3 was launched. The PS3 cost £425, and its GPU was capable of processing 1.8 teraflops.

IBM’s Watson is a computer for learning. It was initially built with the aim of beating the human champion of the US game show, Jeopardy, and in 2011, it beat the human champion of Jeopardy — by a lot. (In the picture below, Watson is the one in the middle).

The Watson computer on the gameshow, Jeopardy. Credit: IBM

It’s hard to find development costs for Watson, but a conservative estimate would put them at £12m over ten years. Four years after the Jeopardy victory, the Cognitoys Dino, a smart toy for children, will go on sale. It costs £80, and is powered by Watson.

In 2012 Google used a network of 16,000 connected processors to teach a computer how to recognise photographs of cats. Three years later, the same technology can now successfully identify a photo of a “man using his laptop while his cat looks at the screen”.

Man using his laptop while his cat looks at the screen

I’ve told these three stories to make a point: that cheap and plentiful processing has allowed Artificial Intelligence to improve very much, very quickly.

There are lots of different strands to A.I., but the big recent breakthroughs have been in deep learning using neural networks. Very broadly, a neural network is a series of nodes that each perform a single action, of analysis or classification, on an input. The result of the action is passed onto another set of nodes for further processing, until the network returns a final output, stating with some degree of certainty what the input is. It can return a rough answer quickly, or a more precise answer slowly. It’s sort of how our brains work.

As an illustration of the potential of deep learning, computer scientists in Germany have used neural networks to analyse the painting styles of the Old Masters and apply them to photographs. This is not simply applying image filters — the system detects sky, water, trees, buildings, and paints them according to the styles of artists such as Turner, Van Gogh, and (shown here) Munch.

A photo of a street, and the same image rendered in the style of Munch

And having conquered the world of art, now these A.I. systems are coming for your job.

This graph from an article on Gartner.com shows a rough approximation of the likelihood of your job being automated by learning systems in the near future. It works on two axes: routine, and structure.

Graph showing the likelihood of a job being automated

To summarise it, if your job deals with abstractions and varies greatly from day to day, you’re probably safe. But if you deal with hard data, like numbers, and you do the same thing every day, it’s probably time to start nervously looking over your shoulder. As a stark illustration of this, recent figures show that the number of employees in the finance departments of major US companies has dropped by 40% since 2004.

More directly related to the Web industry, a recent study by Oxford University and Deloitte indicated that it’s “not very likely” that a web design and development professional will lose their job to automation in the next twenty years. However, their definition of “not very likely” is a 21% chance; to put that more bluntly, one in five of us could be out of work, due to an overall shrinking of the available job market.

There are already signs that automated systems could replace programmers in the future. MuScalpel, a system developed by University College London, can transplant code from one codebase to another without prior knowledge of either; in one test it copied the H.264 media codec between projects, taking 26 hours to do so — a feat which took the team of human engineers 20 days. And Helium, from MIT and Adobe, can learn (without guidance) the function of code and optimise it, providing efficiencies of between 75 and 500 percent in tests.

These systems are a few years away from market, but we’re already starting to see automation move into the industry in smaller ways. Services such as DWNLD, AppMachine, and The Grid offer users a tailored mobile app or website within minutes, with styles and themes based on the content and information pulled from existing social profiles and brand assets. These services, and others like them, will become smarter and more available, skimming away a whole level of brochure sites usually offered by small digital agencies or individuals.

A common criticism of services like The Grid is that they can only produce identikit designs, with no flair or imagination. But look at the collection of websites below; people designed these, and flair and imagination are nowhere to be seen.

Screenshots of homogenous website design

These screenshots are taken from Travis Gertz’ excellent article, Design Machines, in which he highlights the problem:

The work we produce is repeatable and predictable. It panders to a common denominator. We build buckets and templates to hold every kind of content, then move on to the next component of the system.

Digital design is a human assembly line.

Looking back at the Gartner chart on the likelihood of automation, I’d say that “a human assembly line” would be somewhere near the bottom left. And we’ve only ourselves to blame. Gertz again:

While we’ve been streamlining our processes and perfecting our machine-like assembly techniques, others have been watching closely and assembling their own machines.

We’ve designed ourselves right into an environment ripe for automation.

All of the workflows we’ve built, the component libraries, the processes and frameworks we’ve made
 they make us more efficient, but they make us more automatable.

However, brilliant writer that he is, Gertz doesn’t only identify the problem; he also offers a solution. And that solution is:

Humans are unpredictable mushy bags of irrationality and emotion.

This is a good thing, because a computer can never be this; it can never make judgements of taste or intuition. Many people are familiar with the Turing test, where a human operator has to decide if they’re talking to another human, or a bot. But there’s a lesser-known test, the Lovelace test, which sets creativity as the benchmark of human intelligence. To pass Lovelace, an artificial agent must create an original program — such as music, or a poem — that it was never engineered to produce. Further, that program must be reproducible, and impossible to explain by the agent’s original creator.

The idea is that Lovelace should be impossible for an artificial agent to pass. Creativity should be impossible for a computer. And it’s this, not tools, that offers us the opportunity to make our roles safe from automation.

Andrew Ng, who helped developed Google’s deep learning systems and now works at Chinese search company Baidu, has serious concerns that automation is going to be responsible for many job losses in the future, and that the best course of action is to teach people to be unlike computers:

We need to enable a lot of people to do non-routine, non-repetitive tasks. Teaching innovation and creativity could be one way to get there.

But as well as learning to be creative, we should also become centaurs — that is, learn to enhance our abilities by combining our instincts with an intelligent use of artificial intelligence. Many smart people have begun considering the implications of this; Cennydd Bowles wrote:

A.I. is becoming a cornerstone of user experience. This is going to be interesting (read: difficult) for designers.

To the current list of design disciplines we already perform — visual, interaction, service, motion, emotion, experience — we will need to add one more: intelligence design.

Previously I said that A.I. is improving very quickly, and this also means that it’s becoming much more available very quickly. Services that were once only available to the internet giants are now available to everyone through APIs and products, at reasonable-to-free prices.

Remember Watson? All of its power is available to use through IBM’s Developer Cloud; their Bluemix cloud platform and a Node SDK gives you access to powerful and sophisticated image and text services via some RESTful APIs. IBM wants Watson to be the ubiquitous platform for AI, as Windows was to the home PC and Android is to mobile; as a result, Developer Cloud is free for developers, and reasonably priced for businesses.

What do you get with Watson? For a start, some visual recognition tools, like the Google one I mentioned at the beginning of this piece. Upload an image, and Watson will make an educated guess at explaining the content of that image. It works well in most cases (although was bizarrely convinced that a portrait of me contained images of wrestling).

These identification errors should reduce in frequency as you give Watson more data, because deep learning thrives on training and data. That’s why all the major online photo tools, from Google Photos to Flickr, entice you with huge storage limits, in many cases practically unlimited; they want you to upload more, because it makes their services better for everyone. These services include automatic tagging of photos and content-based search; Google Photos in particular is very good at this, easily finding pictures of animals, places, or even abstract concepts like art.

Google Photos search results for ‘street art’

Eventually these offerings will raise the expectations of users; if your photo service doesn’t offer smart search, it’s going to seem very dumb by comparison.

Watson can also find themes in batches of photos, offering insights into users interests and allowing for better targeting. This is another reason why photo services want you in: because you become more attractive to advertisers.

I should add that Watson is not the only game in town for image recognition; alternatives include startups like Clarifai, MetaMind, and SkyMind; and Microsoft’s Project Oxford, which powers their virtual assistant Cortana. Project Oxford has best-in-class face APIs, able to detect, recognise, verify, deduce age, and find similar faces; how you feel about that will largely depend on your level of trust in Microsoft.

While image recognition is interesting and useful, the ‘killer app’ of AI is natural language understanding. The ability to comprehend conversational language is so useful that every major platform is adopting it; Spotlight in OSX El Capitan allows you to search for “documents I worked on last week”, while asking Google “how long does it take to drive from here to the capital of France?” returns directions to Paris.

If you want to add natural language understanding to your own apps, one of the best tools around is the Alchemy API, originally a startup but now part of the Watson suite. This offers sentiment analysis, entity and keyword extraction, concept tagging, and much more.

Natural language understanding is a key component in a new wave of recommendation engines, such as those used in Pocket and Apple News. Existing recommendation engines tend to use ‘neighbourhood modelling’, basing recommendations on social graph interaction; but the new AI-powered engines understand the concepts contained in text content, allowing it to be better matched with other, similar content.

Where AI really excels, however, is when applied to conversation. Talking to a computer is nothing new; to give just one example, IKEA have had a customer service chatbot called Anna on their website since at least 2008. But although Anna can answer a straightforward question, she has no memory; if you don’t provide the same information in a follow-up than you did in the previous question, you’ll get a different answer. This isn’t really a conversation, which has requirements as defined here by Kyle Dent:

A conversation is a sequence of turns where each utterance follows from what’s already been said and is relevant to the overall interaction. Dialog systems must maintain a context over several turns.

Maintained context is what today’s AI conversations offer that was previously missing. Google are using this to trial automated support bots, trained on thousands of previously recorded support calls. (The same bots, trained on old movies, can be surprisingly philosophical.) And it’s what powers the new wave of virtual assistants: Cortana, Siri, Google’s voice search, and Amazon Echo. This latter is particularly interesting because it lives in your home, not your phone, car or TV; it’s the first move into this space, soon to be joined by robots with personality, like Jibo, or Mycroft.

All of these virtual assistants share another feature, which is that you can interact with them by voice. Voice isn’t essential to conversation, but it helps; its much easier than using a keyboard in most cases, especially in countries like China which have very complicated character input, and high levels of rural illiteracy.

Making computers recognise words has been possible for a surprisingly long time; Bell Labs came up with a voice recognition system back in 1952, although it could only recognise numbers, and only spoken by a specific person. There was a further breakthrough in the 1980s with the ‘hidden Markov model’, but deep learning has hugely improved voice recognition in the past three years; Google says its voice detection error rate has dropped from around 25% (one misidentified word in every four) in 2012 to 8% (one in twelve) earlier this year — and that’s improving all the time. Baidu says their error rate is 6% (one in seventeen), and they handle some 500 million voice searches every day.

Voice recognition is available in some browsers, namely Chrome and Firefox, through the Web Speech API, and many other products including Watson and Project Oxford offer speech-to-text services. These all require a microphone input, of course, which unfortunately rules out using Safari or any iOS browser at all.

But while voice recognition can identify the individual words in an utterance, it doesn’t understand in any way the meaning of those words, the intent behind them. That’s where the previously-mentioned breakthroughs in natural language understanding come in. There are a growing number of voice-based language understanding products now available, including Project Oxford’s LUIS (Language Understanding Intelligence Service — I love a good acronym) and the startup api.ai. The market leader in parsing complex sentences is Houndify, from SoundHound, but the service I like for its ease of use is Wit.

Wit was once a startup but is now owned by Facebook. It’s free to use although all of your data belongs to Facebook (which may be a deal breaker for some) and is available to every other user of the service — because, as I said earlier, more data gives deep learning systems more power. It has SDKs for multiple platforms, but where it wins for me is in its training system, which makes it very easy to create an intent framework and correct misinterpreted words.

Wit is the power behind M, Facebook’s entry into the virtual assistant market. M is notable because it only lives inside Facebook Messenger, which is a pattern I’m sure we’re going to see much more in the future: the AI-powered shift from the graphical user interface to the conversational; from GUI to CUI.

There’s a reason that Facebook paid an estimated £15 billion for WhatsApp, and it’s not solely their £6.5 billion in sales: it’s because messaging apps are huge, and WhatsApp is the biggest of all, with some 900 million monthly active users. What’s more, messaging apps are becoming even more huge really quickly; they’re the fastest growing online behaviour within the last five years, and an estimated 1.1 billion new users are set to come on board in the next three years.

And messaging apps as we know them in the West are actually very limited compared to messaging apps in Asia, especially China, where they are more like platforms than apps: you can shop, bank, book a doctors appointment
 basically, anything you can do on the web today. Messaging apps are a huge growth area, and they’re going to be powered by conversational assistants.

We can see this beginning already with apps like chatShopper (currently Germany only) which lets you talk to a personal shopper to make purchases through WhatsApp; and Digit, a savings app that communicates almost entirely by text message. These currently use a mix of automatic and human operators (this is also how Facebook’s M works right now), but as AI becomes more intelligent the bots will take over from the humans in many cases.

More advanced, fully automated services include x.ai’s ‘Amy’, or the apparently very similar Clara. These are meeting assistants that work by email; you ask them to find a suitable time and place for a meeting, and they communicate with all the participants until the arrangements are made, then email you back with the final details.

Conversational UI is an idea that’s time has come, enabled only now by AI and natural language understanding. To add it to your own apps you could look at Watson’s Dialog service (a similar service is also apparently coming to Wit in the near future), or a startup such as re:infer. But it’s not only a case of plumbing in a service, it will also require an addition to the list of design disciplines I mentioned previously: conversation design.

I should note that new interaction models are still prone to old problems; security, privacy and trust should always be paramount in your applications. Remember the Samsung TV scandal earlier this year? Do we really want a repeat of this but with an artificially intelligent Barbie in children’s bedrooms?

The ready availability of deep learning services has come upon us so quickly that we’ve barely realised; many of the services I’ve mentioned didn’t exist even 18 months ago. This is a little bit scary, and a huge opportunity. There’s little doubt that AI’s going to take routine jobs from the web industry; so as AI improves, we need to improve with it. The way to do that is to harness AI for our own use, and apply creative, irrational, human thinking to it.

Cross-posted to Medium.

Comments are closed.