- CAMBRIDGE - Scientists at
the Massachusetts Institute of Technology have created the first realistic
videos of people saying things they never said - a scientific leap that
raises unsettling questions about falsifying the moving image.
-
- In one demonstration, the researchers taped a woman speaking
into a camera, and then reprocessed the footage into a new video that showed
her speaking entirely new sentences, and even mouthing words to a song
in Japanese, a language she does not speak. The results were enough to
fool viewers consistently, the researchers report.
-
- The technique's inventors say it could be used in video
games and movie special effects, perhaps reanimating Marilyn Monroe or
other dead film stars with new lines. It could also improve dubbed movies,
a lucrative global industry.
-
- But scientists warn the technology will also provide
a powerful new tool for fraud and propaganda - and will eventually cast
doubt on everything from video surveillance to presidential addresses.
-
- ''This is really groundbreaking work,'' said Demetri
Terzopoulos, a leading specialist in facial animation who is a professor
of computer science and mathematics at New York University. But ''we are
on a collision course with ethics. If you can make people say things they
didn't say, then potentially all hell breaks loose.''
-
- The researchers have already begun testing the technology
on video of Ted Koppel, anchor of ABC's ''Nightline,'' with the aim of
dubbing a show in Spanish, according to Tony F. Ezzat, the graduate student
who heads the MIT team. Yet as this and similar technology makes its way
out of academic laboratories, even the scientists involved see ways it
could be misused: to discredit political dissidents on television, to embarrass
people with fabricated video posted on the Web, or to illegally use trusted
figures to endorse products.
-
- ''There is a certain point at which you raise the level
of distrust to where it is hard to communicate through the medium,'' said
Kathleen Hall Jamieson, dean of the Annenberg School for Communication
at the University of Pennsylvania. ''There are people who still believe
the moon landing was staged.''
-
- Currently, the MIT method is limited: It works only on
video of a person facing a camera and not moving much, like a newscaster.
The technique only generates new video, not new audio.
-
- But it should not be difficult to extend the discovery
to work on a moving head at any angle, according to Tomaso Poggio, a neuroscientist
at the McGovern Institute for Brain Research, who is on the MIT team and
runs the lab where the work is being done. And while state-of-the-art audio
simulations are not as convincing as the MIT software, that barrier is
likely to fall soon, researchers say.
-
- ''It is only a matter of time before somebody can get
enough good video of your face to have it do what they like,'' said Matthew
Brand, a research scientist at MERL, a Cambridge-based laboratory for Mitsubishi
Electric.
-
- For years, animators have used computer technology to
put words in people's mouths, as they do with the talking baby in CBS's
''Baby Bob'' - creating effects believable enough for entertainment, but
still noticeably computer-generated. The MIT technology is the first that
is ''video-realistic,'' the researchers say, meaning volunteers in a laboratory
test could not distinguish between real and synthesized clips. And while
current computer-animation techniques require an artist to smooth out trouble
spots by hand, the MIT method is almost entirely automated.
-
- Previous work has focused on creating a virtual model
of a person's mouth, then using a computer to render digital images of
it as it moves. But the new software relies on an ingenious application
of artificial intelligence to teach a machine what a person looks like
when talking.
-
- Starting with between two and four minutes of video -
the minimum needed for the effect to work - the computer captures images
which represent the full range of motion of the mouth and surrounding areas,
Ezzat said.
-
- The computer is able to express any face as a combination
of these faces (46 in one example), the same way that any color can be
represented by a combination of red, green, and blue. The computer then
goes through the video, learning how a person expresses every sound, and
how it moves from one to the next.
-
- Given a new sound, the computer can then generate an
accurate picture of the mouth area and virtually superimpose it on the
person's face, according to a paper describing the work. The researchers
are scheduled to present the paper in July at Siggraph, the world's top
computer graphics conference.
-
- The effect is significantly more convincing than a previous
effort, called Video Rewrite, which recorded a huge number of small snippets
of video and then recombined them. Still, the new method only seems lifelike
for a sentence or two at a time, because over longer stretches, the speaker
seems to lack emotion.
-
- MIT's Ezzat said that he would like to develop a more
complex model that would teach the computer to simulate basic emotions.
-
- A specialist can still detect the video forgeries, but
as the technology improves, scientists predict that video authentication
will become a growing field - in the courts and elsewhere - just like the
authentication of photographs. As video, too, becomes malleable, a society
increasingly reliant on live satellite feeds and fiber optics will have
to find even more direct ways to communicate.
-
- ''We will probably have to revert to a method common
in the Middle Ages, which is eyewitness testimony,'' said the University
of Pennsylvania's Jamieson. ''And there is probably something healthy in
that.''
-
- Compare original and synthetic videos from MIT on www.boston.com/globe.
-
- Gareth Cook can be reached at cook@globe.com. This story
ran on page A1 of the Boston Globe on 5/15/2002. © Copyright 2002
Globe Newspaper Company.
-
- http://www.boston.com
|