Excellent post, Rob!
> So decision making, no matter how complex, has to
> collapse its outputs into one of the n predictable
> actions. How to express subtlety and wide range of
> behavior with such a limited palette?
You've hit the nail on the head. Animations and sounds are an enormous behavioral bottleneck for us.
> Solutions to the problem - procedural animation,
> text-to-speech synthesis, and so on - only work
> in limited contexts. They can’t replicate the
> depth of detail and emotion that a good animator
> or voice artist can bring out. And replicating
> those details procedurally seems like an
> AI-complete problem all by itself.
I mostly agree with this, but I've seen some indications that it's (kinda, sorta) becoming possible to bite off some aspects of the problem procedurally ... in some ways, sometimes, if you're lucky.
Between NaturalMotion's "endorphin" product and the work being done on Spore, I've seen a lot of kinds of procedural work done on the animation side that convince me that at least for *some* kinds of animations -- locomotion and physical reactions in particular -- it's possible to create procedural animations that sometimes match the quality of professional animators. If you want to see human characters getting knocked about by various physical forces and reacting in basic ways, endorphin can do a really good job of animating that sort of thing.
That's counterbalanced by the recognition that there are a lot of kinds of animations (particularly emotive animations, such as taunts) that can never be done procedurally -- even theoretically -- because they're all about broadcasting the designer's intentions.
And when I look at the animations for creatures in some of the recent games I've worked on, fewer than half of the animations for each creature are the kind of locomotion animations or physics reactions that a tool like endorphin would be useful (never mind the fact that none of them were remotely humanoid, and endorphin only really works well with human characters).
So yeah, that's discouraging.
I've also looked into text-to-speech for a previous project, and some surprising advances that have been made ... there's an AT&T product (see http://www.research.att.com/projects/tts/ and http://elvis.naturalvoices.com/demos/) that produced remarkably good but somewhat robotic voices. We considered using it for on-the-fly speech synthesis in a game that featured lots of robots, since the robotic aspect of the output would fit right in for those characters ... until we found out that the database required to support the speech synthesis required something like 128 megs, which pretty much ruled it out for an Xbox title.
Still, though, it's a big step in the right direction. It will be great to see what this kind of technology can do 5-10 years from now.
The frustrating part of it all, though, is that so much of this requires a level of R&D that's way outside the reach of most game developers. These kinds of problems are really begging for much more attention from academia than they've received to date.
Posted by Paul T at December 17, 2005 04:38 PMThis research work into procedural facial animation might be of interest. I saw a demonstration of it in action, and it's a pretty good step towards emotive facial animations. The goal of it was to allow animators to work on crafting sets of facial movements that could be grouped by emotion, and independant of the actual face model. Then the actual animation could be done procedurally mapping any response set to any face model with very little manual model setup. This included lip syncing, which was actually demonstrated working on-the-fly with some generic facial gestures triggered by speech emphasis. (It gets poorer results on-the-fly due to quick work on a smaller audio buffer, of course, but it was still pretty impressive.)
Anyway, here's the URL:
http://ivizlab.sfu.ca/research/iface/
I really like the idea (and the term) of "presentation layer", and I agree that to even talk about presentation as a separate layer on top of a more fundamental internal intelligence is already probably heading down the wrong path (and smacks of the homunculus, doesn't it? :). It makes me wonder what I would come up with if I dedicated myself wholeheartedly to the principle of an out-to-in intelligence (i.e. start with the "presentation" -- the animation and motor systems and all their glorious continuous output -- as the architectural foundations and work inwards to more complex internal representations and forms of reasoning) rather than in-to-out (in which the presentation layer, as you've said, is something of an afterthought). We'd at least probably avoid the mapping problem that we always end up with when we take the in-to-out road (I have all this internal complexity but I have to map to a massively impovrished presentation layer). Maybe it would even ensure (hallelujah, if so!) that the internals would only be as complicated as they had to be, and never more so.
On the other hand, maybe it just leads to AI as a glorified animation-playback system. It probably would, unless Design was particularly behavior-savvy (and therefore DEMANDED such things as internal world-models, and expectation-formations and the like).
(Maybe the other reason we don't generally take this approach now is because it sounds way more complicated structurally than the generally clean in-to-out approach, in which a modular decision-maker passes off nicely encapsulated instructions to a modular motor/animation system. Out-to-in is practically a renouncement of any structure at all.)
It probably bears noting, in any case, that evolution is an out-to-in intelligence designer too -- as a species, we learned to move around and emote before we ever developed real language or symbolic reasoning ability. I guess in both games and nature, the presentation is the point, right?
[This is out there, but here's another cute note: there's an analog to this debate in stage/film acting (as in the artform, for movies and plays. "Plays" are the old thing where people get on a stage and perform in front of a live audience. The audience is also composed of people.) On one side of the spectrum we have the Stanislavski System of acting, which emphasizes the development of an internal truth which then drives physical action (in-to-out) and on the other, the Meyerhold System (among others), which emphasizes expression through purely physical gesture working back eventually to an internal truth (out-to-in). I've also heard these expressed as the American vs. the British schools of acting, although both Stanislavski and Meyerhold were Russian -- go figure.]
I'm sure it depends on the type of game. I bet the DOA4 AI is pretty close to a "glorified animation-playback system", and rightly so, but the AI for Civ 4 hardly needs to consider the presentation layer at all.
I'm not sure the presentation layer is all about technology, though. An animator that understands how to broadcast a coming melee attack, or can really sell an enemy reacting to a bullet impact, can add a _lot_ to the Player's experience.
So, I would say good presentation is about communicating the AI's goals to the content guys.
Posted by Jaime at December 19, 2005 12:06 PMI like the distinction of "out-to-in" vs. "in-to-out" development. That's a very useful distinction.
Bit of a tangent:
I was on the treadmill in the gym last weekend when I picked up a copy of Newsweek. I was taken aback because all the letters seemed tiny, and for a few moments I couldn't figure out if there was something wrong with my eyes ... until I realized I'd just finished reading Gold's Gym's copy of Reader's Digest, which has enormous, grandma-can-read-it-without-her-glasses lettering. The huge typeface of Reader's Digest adjusted my mental frame of reference over several minutes to the point that when I picked up Newsweek, the "normal" fonts seemed much smaller.
That got me thinking about game AI, particularly with regards to the out-to-in approach to development. It's so incredibly difficult to maintain any kind of objectivity when you work with the same creatures executing the same behaviors day after day for months to years of development. And I think that makes the "out-to-in" approach challenging because it becomes incredibly difficult to be objective about what the "out" part should really look like.
It's particularly noticeable in games where creatures follow distinct patterns. When I'm playing World of WarCraft, one part of me is happily playing along, blithely ignoring any behavioral strangeness ... and the AI developer inside me thinks it looks ridiculous that the AIs react purely based on distance rather than line-of-sight, that mobs walk in continuous random patterns inside fixed areas, that creatures can walk right over the dead bodies of their comrades without reacting, that the same animations are played over and over again in totally predictable ways, or that I can pull the same mob 100 times in a row and run away again without him ever catching on.
Obviously there are good reasons for all of these features -- it's an MMORPG, after all, and requires a very patterned kind of AI, not a criticism of World of WarCraft, etc. -- but it bothers me that I get used to it.
The best solution I have for now is to make sure you have lots of fresh eyes on the AI on a regular basis, and play plenty of other games to keep your expectations fresh. But I'd be interested to hear what ideas anyone else might have to mitigate this.
Posted by Paul T at December 21, 2005 08:37 PMI imagine that those specific WoW AI examples are the results of Game Design decisions, not tunnel-vision, but it's definitely an interesting topic.
When we first playtested our AI, we started out trying to maximumize the number of people that thought our AI was really smart. It turns out, people are very reluctant to say that the AI is "smart" (as in, "smarter than me") so we switched gears and tried to minimize the mumber of people that said the AI was not smart.
Most of the things that made our AI look not smart where part of the presentation layer. Failing to react immediately to an attack, jerky pathfinding problems, repeating lines of dialog.
So, synthesizing points made in the discussion so far with my own experience, my theory is this:
You should make your AI smart by focusing on internal structures and behaviors and also make it not dumb by focusing on the external presentation.
Posted by Jaime at December 21, 2005 11:06 PMI'm not really concerned with the "smart" vs. "not smart" distinction, only with how to ensure that my characters are thoroughly believable and interesting to the player.
When you're developing, say, human soldiers in a World War 2 shooter, you have this enormous library of films and books that you can use to get an excellent sense of how actual soldiers behaved in combat. With something like the Metroid games, on the other hand, you're dealing with all kinds of exotic science-fiction creatures, and you don't have any fixed frame of reference for how these kinds of creatures should behave, so it's very easy for your perception to drift imperceptibly over time.
Posted by Paul T at December 22, 2005 09:11 AMHmm, I agree that separating out presentation from the "engine" does seem like it's heading down the wrong path. But it's a path lit by two powerful beacons: first, architectural cleanliness, and second, conceptual agreement with long-standing Cartesian traditions in AI. And when the path is lit so brightly, who can resist, even if we know it won't lead us where we want to go? ;)
And by the way, I agree that Design needs to become more AI-savvy as well. We can build very complicated interactions between higher-level behaviors and lower-level behaviors and sensors and actuators, but if Design only sees this system as a set of numbers in a spreadsheet, then all of the system's expressive power will be for naught. If designers and scripters were to begin working directly with a more expressive system, maybe they'd develop a taste for more complexity, in both the behavior and the knowledge representation? :)
And re procedural animation: I agree that procedural stuff can be very useful in limited contexts. It's fine to do procedural animation in Spore, because nobody knows how a three-legged walking eyeball should walk. :) But when it comes to human behavior, we're all domain experts, unfortunately...
For text-to-speech and speech-to-text in particular, I've been interested in this tech for a while, but the whole situation just sucks. We've hit the point of diminishing returns a while back. In STT, recognition rates have been stuck in the upper 90s for years, and I'm personally suspicious that's the limit of statistical methods that have no representation of what's being talked about. As for TTS, people try making the voice sound better and better by using increasingly more complex voice sample libraries, but this doesn't seem to scale outside of non-emotive-personal-assistant kinds of contexts, either. Very discouraging for conversational AI - but good news for voice actors! ;)
But I love the idea of using something like endorphin (or any of the number of facial animation systems) as first-pass animation generators, to be further arranged manually. Have you guys used it for more than evaluation?
Posted by Rob at December 30, 2005 06:51 PM> separating out presentation from the "engine" does seem like it's
> heading down the wrong path. But it's a path lit by two powerful
> beacons: first, architectural cleanliness, and second, conceptual
> agreement with long-standing Cartesian traditions in AI
Whoah there - hang on a second Rob! What about the enactive shift in cognitive science, and related changes in AI over the last 15 years? Conceptual agreement with Cartesianism is a worrying sign in my book! I guess I'd always assumed that most media lab AI people were working in the enactive tradition, but come to think of it I suppose the synthetic characters group tries to balance itself between Good Old Fashioned AI desires for internal representations driving action, and Brooksian skepticism about said internal representations? Even if you come from a 'balanced approach' background (in-to-out vs. out-to-in / top-down vs. bottom-up / symbolic vs. non-symbolic / etc.) in the last 10-15 years, surely proclaiming Descartes as mentor is rhetorically suspect? ;)
Btw, my apologies to all for being awol from discussions for so long, and then only popping up to go off-topic and talk about philosophy - I promise to post something more useful sometime soon :)
Posted by Adam R at January 23, 2006 05:00 PM> I cut my teeth on Heideggerian robotics [...] It's good to see so
> many continentalists in the community.
Rob, just noticed these words from you in a post last July! I guess I may have been misreading your rhetoric ;)