Since early spring of 2023, artificial intelligence (AI) has experienced a surge in popularity, captivating both the tech industry and the creative sphere. The once esteemed venture capitalists, who were once fervent proponents of Web3, have redirected their attention toward AI, signaling a shift in this focus. Consequently, developers are now migrating to AI projects, causing Web3 to lose its former allure. Alongside this transition, various noteworthy developments within the creative industry have raised a lot of eyebrows. These include platforms such as Midjourney and Stable Diffusion, which harness generative image-based technologies, as well as Adobe’s launch of their impressive “generative” fill tool in Photoshop beta.
AI and its potential bad-case scenarios have provoked a certain degree of apprehension among creatives. However, the genie can’t be put in the box and we can’t afford to dismiss or ignore the potential of AI. I wanted to gain an understanding of AI and explore its practical applications, so I embarked on a journey to delve into its possibilities. I found Stable Diffusion to be more suitable than Midjourney for practical creative utilization, because it’s my opinion that Midjourney is excessively constrained and lacks sufficient control over its output, its randomness hinders its practicality. Stable Diffusion seems to addresses these concerns. While I only scratched the surface of the vast array of techniques and tools available, I did explore AI in earnest and my primary objectives were twofold: firstly, to explore the practical use-cases of AI within my workflow, and secondly, to ensure that the workflow was reproducible and could withstand the pressure of art direction.
I gravitated towards employing AI in video production and employed preprocessing techniques such as OpenPose, Depth, and Canny (detail) in ControlNet, essentially harnessing the power of AI to generate imagery based on a delicate balance of prompts, textural inversions, embeddings, and loras. Familiarizing oneself with the settings and embeddings, requires some practice. I noticed that even a slight alteration in the output resolution can yield an entirely distinct image, which makes sense considering the diffusion-based AI model. After numerous iterations, I began to appreciate the significance of temporal consistency, (basically image fidelity and reducing the randomness inherent in AI.) As a result, I have come remarkably close to achieving smooth AI-generated outputs(see: “Bounce” and “DJ Bob.”) I also explored utilizing AI for sound production, training my voice to sing “The Secret to Life” based off 18 minutes of my voice samples. I believe that AI has the potential to be an invaluable tool within the motion designer’s arsenal. However we are still in the early-days; everything is new and continually evolving, processes and techniques that proved effective just a month ago become obsolete with the arrival of subsequent updates. I think we’re 2-4 years away until the true capabilities of AI can be harnessed by creatives in practical workflows.
There is often a prevailing sentiment that Adobe’s generative tool or Midjourney’s existence heralds the end of designers as we know them. However, we must remember that not everyone began their journey as concept artists or storyboard creators, and these tools primarily find practical applications within the preproduction phase of the creative pipeline. While there are some use-cases in production, they are predominantly relevant only when the art direction explicitly demands an intentional “AI” aesthetic. Personally, I believe AI will shake things up VFX pipelines – in areas like roto-scoping and keying, which used to eat up a ton of time. Another noteworthy application of AI will definitely be music. As for the future, it’s still too early to tell where it’s all headed. I recall hearing someone say that it’s easy to identify the jobs that AI threatens, but it’s much harder to pinpoint the jobs it will create. Overall, I believe AI will be a positive force for creatives.
Based on my experiences diving into AI, I’ve come to the conclusion that AI can never truly replace creatives. We need creative minds to think outside the box, to conceive ideas, and to grasp the subtleties of human psychology. AI is simply a tool that can enhance and assist our creative endeavors.
PoseAI + Synthwave model exploration.
Early OpenPose/Sailor Moon model exploration.
‘Bounce.’ Fine-tuning OpenPose/Depth preprocessing (good result)/Temporal Kit for better image fidelity.
‘DJ Bob.’ Exploration of TemporalNet + Stable Diffusion for better temporal consistency. Initial exploration in voice-trained AI (based off Jeremy Stiles voice).
‘The Secret to Life’ by James Taylor. Sampled my around 18 minutes of my voice for vocal training via AI. Also trained my own model based on my photos to generate AI versions of myself in various poses and settings.
You can see several passes that AI can generate along with the ‘rendered’ image, we see a depth pass and a seg pass (essentially a cryptomatte.. sorta kinda) but it’s still not perfect for exact workflows, but still impressive nonetheless. In a few years I predict we’ll be blown out of our minds what can be done using AI.
Left: Modern Living Room generated ‘beauty’ pass; Middle: segmented pass; Right: ‘Depth’ pass