I want to take a photo of an object, say, a human face with hair where body and background don't matter and are separated. Then the face, with a pole from middle top of head to the middle bottom of chin, is copied and reflected 180 degrees for a separate image, so a stereo pair (3D) is created. The best back of head based on look of hair is added as third image for better 3D model of head. Fully rigged bodies (say from TurboSquid) are added later. Then a story is provided which looks at sentences as scenes, and all important word types (nouns, pronouns, verbs, adjectives. etc.) are separated where the best "pre-rigged equations of motion" (and that includes zoom, angle, depth of field, etc. for cameras) are "looked up" and when the human is asked for best object and background match from a library and they are assigned to a static scene, the best equations of motion are assigned. For example:
Mary took her bicycle to the store. Mary asked the clerk, "Do you have a loaf of bread?" The clerk sold her Wonder Bread. Mary discovered her friend Ted in the parking lot. Ted said, "Can I put your bike in the truck and take you home?" Mary said, "Sure". Mary and Ted drove back to Mary's house.
The algorithm must look at all works and their relationships, best matches for backgrounds, Mary, the clerk, and Ted, the best motion for the bicycle, the best way to put Mary, Ted, and the bicycle in the truck and move them and the truck toward the house.
Movement is translational, vibrational, and rotational in 3 dimensions (x, y, z). Rigged objects must know all bone joints and do the best "path plot" for movement with editable input from human, like making the truck go faster, make Mary smile, best simulated voice match, etc.