AI storyboarding

When preparing a film for production, you create a storyboard. My hypotheis is that you can use AI-enabled tools to partly automate this activity, and this simple PoC will be a first try to verify this.

Background

A script has a very standard format based on scenes. For each scene, you have one or more shots. In each scene, you mix action with dialogue.

If we make it very simple, we have one shot per scene, but the downside with this approach is that the film gets very static. Today’s films have shots that are in average 4-12s, and one script page is usually one minute film and one scene. Thus we need 5-15 shots per scene.

If we instead say that the first action before the dialog is one shot and the dialogue is one tighter shot, we are more inline with how films are created.

Preparations

To create a PoC, we need film manuscript, an AI-tool for image creation,  a tool for storyboarding. Next step will be generate dialogue from text, but this is not part of the PoC.

I have a draft version of a film manuscript, Lost and Found where I have the copyright, and this is the foundation for the storyboard.

I’ve access to Azure OpenAI services and I’m using Dall-E to create the images from the text in the script.

Finally I have Yamdu (cloud service) as a tool for storyboarding manuscripts.

Step-by step approach for the MVP

Import the full manuscript into Yamdu and create a list of scenes.
For each scene in the manuscript until THE END
      For each paragraph 
            Read text until next dialogue or action in manuscript
            If action
                   Create image with text from action
                   Download the image
                   Create a shot in Yamdu
                   Add the image to the shot in Yamdu  
            If dialogue
                   Create image based on character
                   Download the image
                   Create a shot in Yamdu
                   Add the image to the shot in Yamdu

The script contains roughly 80 scenes, but for the first MVP, you only need a few scenes to verify the concept, and this 100% manual workflow . The result after importing the images to Yamdu is shown below.

Conclusions

Findings so far, the manuscript has to much more explicit to make correct images and the AI tool doesn’t have the context awareness a reader have. Angelica is blond in the first scene, so it’s not repeated in the manuscript, but in the image for scene 7, she has dark hair. See the problem from How is Jimmy dressed.

A minor annoyance is the filters in Azure OpenAI. I couldn’t use bruise, robe or irritated as these words were classified as violence, sex and hate, and not allowed by the content filter.

There is an app that reads a manuscript and creates different voices for each character in the manuscript, so this is neither a new functionality. Should be fairly simple to incorporate this into the final product.

Otherwise, I think some clever programming, using Azure OpenAI services, or some other similar API’s, together with some API’s towards Yammer would work.

What do you think?