YAAY... I have something working. Now I know you might think it ain't great animation. But in order to get to great, you need to pass this post first. Keep in mind, this is all automated. In a few clicks, you can get a lip-sync. There are still some improvements I'm planning in two areas. 1) The poses. I need to clean up the poses for each phoneme a bit more. 2) Keyframing. I'm going to introduce a polish step to clean up the keyframes. This step will require some trial and error to figure out a bunch of lip-sync heuristics. As an example, I don't have to turn off a pose completely if it'll morph a few frames later. Things like that, I believe will make it look a bit better.

Of course at the end, it'll require some manual clean up to really polish it.

I'm also planning on adding some emotional meta data. This information will translate into corresponding facial expressions and head movements. I'm thinking the input will be a wave clip and an XML file, which would include the transcript and emotional meta information, sorta like this

<transcript>

<panic>We have to get out of here</panic>

<sad>but with a broken leg he can't come with us</sad>

</transcript>

Anyway, still thinking on it.