Artificial Intelligence: Multimillennial Data Transmitted To Machines With Brains

April 12th 2021 new story

@Eye on AI [email protected]

Artificial intelligence feeds on data, and data is piling up from increasingly cheap sensors and surging Internet use: videos, images, text; time series data, machine data; structured, unstructured and semi-structured data. And while AI is currently confined to narrow problems in discreet domains, the ambition of machine-learning researchers globally is to write algorithms that can cross domains, transferring learning from one kind of data to another.

0 reactions

When that eventually happens, AI systems will no longer focus on their little data hill, but crawl along the entire mountain range.

0 reactions

The first place this is likely to happen is on data that has been annotated by humans. While there are many streams of AI, the one that has carved the deepest ravine is called supervised learning. Just as humans learn in school by being told repeatedly the correct answer to a question, supervised learning systems are given thousands, even millions of examples of what they are being trained to recognize until they can spot those things in data they have never seen before.

0 reactions

Show a computer vision system millions of X-rays of confirmed lung cancer patients and the system will become an expert at diagnosing lung cancer from X-rays.

0 reactions

Around the world, hordes of unskilled workers are annotating data used to train such machine-learning models. Images, videos, audio and text are being labeled by working mothers in Madagascar, migrant workers in Beijing, uneducated young men in India and otherwise unemployed autistic adults in the United States. But what are they doing, exactly?

0 reactions

Besides tagging objects in an image – this is a car, that is a person – or flagging child pornography in videos, or identifying verbs in a block of text or instruments in music, this growing yet disparate army, soon to be millions of people, are filling vast data lakes with meaning. These lakes are not yet connected, but once filled, they will remain indefinitely. Eventually, canals will be dug between them and, at some point, the lakes will become seas and then oceans of human understanding in digital form. That data will inform ever more sophisticated machine-learning models, which are already drinking in knowledge and making decisions based on what they learn. It’s a remarkable endeavor that will change human life forever.

0 reactions

Meaning is a relationship between two kinds of things: signs and the things they express or signify. To an infant, an elephant is not an ‘elephant’ until it is named and only then does it take on meaning. To a computer, an elephant is even less: nothing more than an array of light waves hitting a digital image sensor that converts those waves into numbers stored on a memory chip. It isn’t until a human tells the computer what those numbers are that a supervised learning system can begin to use that information in a meaningful way.

0 reactions

So, the woman in Madagascar, the worker in Beijing, the man in India and the autistic adult in the U.S. are effectively encoding human knowledge click by click so that that knowledge can be transmitted to rudimentary electronic brains. The brains, made up of massive blocks of recursive computer code, may yet be rudimentary but they can already recognize patterns or identify features – that spot on a lung in an X-ray image, for example – faster and more accurately than any human.

0 reactions

AI systems, meanwhile, are being built to manufacture labeled data synthetically, creating virtual cities, for example, to train computer-vision systems for autonomous vehicles, or spinning endless strings of virtual time series to train financial-market prediction models. Synthesizers can spin up endless amounts of data, particularly for so-called corner cases that are rare in real life. In time, there will be many times more synthetic data, which is cheaper and quicker to produce, than so-called ground-truth, hand-labeled data.

0 reactions

But hand-labeled data will continue to be the gold standard: knowledge painstakingly transferred from human to machine on training data platforms, software designed to allow people scattered around the world to work on the same data sets. Lakes become seas and seas become oceans.

0 reactions

As algorithms improve, what computers can do with that reservoir of labeled data will expand exponentially. It's already starting to happen: transfer learning algorithms can apply what they’ve learned from one dataset to another. The unaddressed challenge is building models that can cross modalities, learning from video, audio and text.

0 reactions

Labeled data ties modalities together: natural language processing to computer vision, for example. Show a computer-vision model an image and it can give you the correct natural-language label, or show the computer model a word and it can give you a correct corresponding image. Researchers are working on multimodal systems that can fuse meaning between images and text, learning from visual data and applying that learning to language or vice versa.

0 reactions

Supervised learning is constrained to relatively narrow domains defined largely by labeled data.

0 reactions

Humans, of course, learn mostly without labels. Everyone agrees that computers will have to go beyond supervised learning to reach the Holy Grail of human-level intelligence.

0 reactions

There is reinforcement learning, which does not rely on labeled data and is modeled after reward-driven learning in the brain. Set a goal for a reinforcement learning system and it will work toward that goal through trial and error until it is consistently receiving a reward like a rat pushing a lever to receive a pellet of food.

0 reactions

There is self-supervised learning, which depends on massive amounts of unlabeled data to accumulate enough background knowledge that some sort of common sense can emerge.

0 reactions

But so far supervised learning works the best and so the data mountains will continue to be worked into labeled data, with training data platforms acting as the ore crushers and sluice boxes and smelters of machine-readable understanding.

0 reactions

The great minds behind the algorithms win awards and are recorded in the history books, but the hard labor of artificial intelligence is provided anonymously by a global army of human labelers; mothers and sons and fathers and sisters, filling ponds and lakes and seas with meaning. If mankind and machines ever reach the fabled singularity, the oceans of knowledge that they have filled are what will lead us first to human-level intelligence.

0 reactions

Manu Sharma is an aerospace engineer who previously worked at computer vision companies DroneDeploy and Planet Labs where he spent much of his time building in-house infrastructure for deep learning models. He is now co-founder of Labelbox, a training data platform for deep learning systems.

0 reactions

Previously published at https://cognitiveworld.com/articles/2021/3/25/encode

0 reactions

Share this story

Join Hacker Noon

Create your free account to unlock your custom reading experience.

Artificial Intelligence: Multimillennial Data Transmitted To Machines With Brain...

Artificial Intelligence: Multimillennial Data Transmitted To Machines With Brains

@Eye on AI [email protected]

Recommend

Github GitHub - xiaochong0302/course-tencent-cloud: 酷瓜云课堂，100%开源的在线教...

Changing the Gaming Industry with Blockchain

From Amputee to Cyborg with this AI-Powered Hand 🦾

解读FDex，链上如何炒外汇？

Network Coding Yields Lower Bounds

关于比特币ETF，您想要知道的都在这里

项目早报 | 美国《时代周刊》将接受比特币付款并持有比特币

8 Best Roblox VR Games Everyone Should Try

Primes, Primes, Primes

Open Source Community Governance the Apache Way (Drost-Fromm & Tompkins, IEE...

About Joyk