Mostly obscure right now, because there is no face and fingers animation, so no expression. The action at the end looks even more confusing, when there's no more sound, it almost looks like a mistake. But if you compare this to the 2D blocking you'll find out what happens. Click to see (840k):
The 2D blocking, for comparison (720k):