3

Artificial Intelligence: Multimillennial Data Transmitted To Machines With Brain...

 3 years ago
source link: https://hackernoon.com/artificial-intelligence-multimillennial-data-transmitted-to-machines-with-brains-x410337v
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Artificial Intelligence: Multimillennial Data Transmitted To Machines With Brains

Artificial intelligence feeds on data, and data is piling up from increasingly cheap sensors and surging Internet use: videos, images, text; time series data, machine data; structured, unstructured and semi-structured data. And while AI is currently confined to narrow problems in discreet domains, the ambition of machine-learning researchers globally is to write algorithms that can cross domains, transferring learning from one kind of data to another. 

0 reactions
heart.png
light.png
money.png
thumbs-down.png

When that eventually happens, AI systems will no longer focus on their little data hill, but crawl along the entire mountain range.

0 reactions
heart.png
light.png
money.png
thumbs-down.png

The first place this is likely to happen is on data that has been annotated by humans. While there are many streams of AI, the one that has carved the deepest ravine is called supervised learning. Just as humans learn in school by being told repeatedly the correct answer to a question, supervised learning systems are given thousands, even millions of examples of what they are being trained to recognize until they can spot those things in data they have never seen before.

0 reactions
heart.png
light.png
money.png
thumbs-down.png

Show a computer vision system millions of X-rays of confirmed lung cancer patients and the system will become an expert at diagnosing lung cancer from X-rays.

0 reactions
heart.png
light.png
money.png
thumbs-down.png

Around the world, hordes of unskilled workers are annotating data used to train such machine-learning models. Images, videos, audio and text are being labeled by working mothers in Madagascar, migrant workers in Beijing, uneducated young men in India and otherwise unemployed autistic adults in the United States. But what are they doing, exactly?

0 reactions
heart.png
light.png
money.png
thumbs-down.png

Besides tagging objects in an image – this is a car, that is a person – or flagging child pornography in videos, or identifying verbs in a block of text or instruments in music, this growing yet disparate army, soon to be millions of people, are filling vast data lakes with meaning. These lakes are not yet connected, but once filled, they will remain indefinitely. Eventually, canals will be dug between them and, at some point, the lakes will become seas and then oceans of human understanding in digital form. That data will inform ever more sophisticated machine-learning models, which are already drinking in knowledge and making decisions based on what they learn. It’s a remarkable endeavor that will change human life forever.

0 reactions
heart.png
light.png
money.png
thumbs-down.png

Meaning is a relationship between two kinds of things: signs and the things they express or signify. To an infant, an elephant is not an ‘elephant’ until it is named and only then does it take on meaning. To a computer, an elephant is even less: nothing more than an array of light waves hitting a digital image sensor that converts those waves into numbers stored on a memory chip. It isn’t until a human tells the computer what those numbers are that a supervised learning system can begin to use that information in a meaningful way.

0 reactions
heart.png
light.png
money.png
thumbs-down.png

So, the woman in Madagascar, the worker in Beijing, the man in India and the autistic adult in the U.S. are effectively encoding human knowledge click by click so that that knowledge can be transmitted to rudimentary electronic brains. The brains, made up of massive blocks of recursive computer code, may yet be rudimentary but they can already recognize patterns or identify features – that spot on a lung in an X-ray image, for example – faster and more accurately than any human. 

0 reactions
heart.png
light.png
money.png
thumbs-down.png

AI systems, meanwhile, are being built to manufacture labeled data synthetically, creating virtual cities, for example, to train computer-vision systems for autonomous vehicles, or spinning endless strings of virtual time series to train financial-market prediction models. Synthesizers can spin up endless amounts of data, particularly for so-called corner cases that are rare in real life. In time, there will be many times more synthetic data, which is cheaper and quicker to produce, than so-called ground-truth, hand-labeled data. 

0 reactions
heart.png
light.png
money.png
thumbs-down.png

But hand-labeled data will continue to be the gold standard: knowledge painstakingly transferred from human to machine on training data platforms, software designed to allow people scattered around the world to work on the same data sets. Lakes become seas and seas become oceans.

0 reactions
heart.png
light.png
money.png
thumbs-down.png

As algorithms improve, what computers can do with that reservoir of labeled data will expand exponentially. It's already starting to happen: transfer learning algorithms can apply what they’ve learned from one dataset to another. The unaddressed challenge is building models that can cross modalities, learning from video, audio and text. 

0 reactions
heart.png
light.png
money.png
thumbs-down.png

Labeled data ties modalities together: natural language processing to computer vision, for example. Show a computer-vision model an image and it can give you the correct natural-language label, or show the computer model a word and it can give you a correct corresponding image. Researchers are working on multimodal systems that can fuse meaning between images and text, learning from visual data and applying that learning to language or vice versa.

0 reactions
heart.png
light.png
money.png
thumbs-down.png

Supervised learning is constrained to relatively narrow domains defined largely by labeled data.

0 reactions
heart.png
light.png
money.png
thumbs-down.png

Humans, of course, learn mostly without labels. Everyone agrees that computers will have to go beyond supervised learning to reach the Holy Grail of human-level intelligence. 

0 reactions
heart.png
light.png
money.png
thumbs-down.png

There is reinforcement learning, which does not rely on labeled data and is modeled after reward-driven learning in the brain. Set a goal for a reinforcement learning system and it will work toward that goal through trial and error until it is consistently receiving a reward like a rat pushing a lever to receive a pellet of food. 

0 reactions
heart.png
light.png
money.png
thumbs-down.png

There is self-supervised learning, which depends on massive amounts of unlabeled data to accumulate enough background knowledge that some sort of common sense can emerge. 

0 reactions
heart.png
light.png
money.png
thumbs-down.png

But so far supervised learning works the best and so the data mountains will continue to be worked into labeled data, with training data platforms acting as the ore crushers and sluice boxes and smelters of machine-readable understanding. 

0 reactions
heart.png
light.png
money.png
thumbs-down.png

The great minds behind the algorithms win awards and are recorded in the history books, but the hard labor of artificial intelligence is provided anonymously by a global army of human labelers; mothers and sons and fathers and sisters, filling ponds and lakes and seas with meaning. If mankind and machines ever reach the fabled singularity, the oceans of knowledge that they have filled are what will lead us first to human-level intelligence.

0 reactions
heart.png
light.png
money.png
thumbs-down.png

Manu Sharma is an aerospace engineer who previously worked at computer vision companies DroneDeploy and Planet Labs where he spent much of his time building in-house infrastructure for deep learning models. He is now co-founder of Labelbox, a training data platform for deep learning systems.

0 reactions
heart.png
light.png
money.png
thumbs-down.png

Previously published at https://cognitiveworld.com/articles/2021/3/25/encode

0 reactions
heart.png
light.png
money.png
thumbs-down.png
7
heart.pngheart.pngheart.pngheart.png
light.pnglight.pnglight.pnglight.png
boat.pngboat.pngboat.pngboat.png
money.pngmoney.pngmoney.pngmoney.png
Share this story
Join Hacker Noon

Create your free account to unlock your custom reading experience.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK