Creating a Visually Stunning AI Video: A Step-by-Step Workflow

I break down my workflow for creating artistic, visually consistent music videos, from storyboarding and generating still images in Leonardo AI to animating with Runway Gen-3 AI and final editing.

ENGLISHAI ANIMATION

Nelli Kite

2/10/20254 min read

Making poetic music video is a deeply creative process, that includes several steps to bring together visuals and rhythm into a consistent experience. Below, I’ll walk you through my workflow for crafting a music video, using the first three verses of my song as inspiration. The song is based on my poem in Russian, translated into English and made into song using SUNO ai.

Lyrics as the Foundation

Every music video starts with a strong visual concept inspired by the song’s lyrics. Here’s the section of my song "The maze of mind" we’ll be working with:

In a world so fragile and unsure,
Where realities like mirrors in a row
On steps like lines that blur,
Where my uncertain steps would go.

In life's maze without a guide,
How do I find the cherished way?
Rules change like in a fairy tale tide,
And forward I can't see or sway.

Heart knows the way,
But I pretend.
Mind builds the maze
Without end.

These lines create a dreamlike, surreal atmosphere, perfect for an abstract and mysterious AI-generated music video. Plus, I was always fascinated by illogical world of M.C. Escher drawings, especially endless stairs closing onto themselves. For me it was very symbolic and reflective of the complex nature of our mind's conditioning and choices between logic and heart's decisions. So, I was very excited to try to make the surreal landscapes of complex merging stairs, reminiscent of Escher's drawing but unique it their own way.

Step 1: Storyboarding the Vision

Before generating visuals, I create a simple storyboard—a sequence of rough sketches or written descriptions mapping out the video. Each key moment aligns with the rhythm and mood of the music. This is a rough sample of storyboard ideas:

Opening Scene: A fragile, surreal world unfolds—surreal floating staircases shifting
In a world so fragile and unsure,
Where realities like mirrors in a row: woman walks into her inner world to face her own mirrors of reality - the constructions of mind, her self-image based on previous conditioning, ideas and beliefs.
She sees her self-image as an intricate and decorated portrait of a woman's face with mazes surreally merged with it.
On steps like lines that blur,
Where my uncertain steps would go.: Unfolding surreal stairs, the world dissolves into abstraction, endless shifting pathways symbolizing feeling of searching and longing.
In life's maze without a guide,
How do I find the cherished way?
Rules change like in a fairy tale tide,
And forward I can't see or sway.: Visually stunning mazes inspired by Escher's drawings. Face of a woman inserted into a pattern of intricate beautiful maze, unreal fairytale visual, something swaying.
The chorus part:
Heart knows the way,
But I pretend.
Mind builds the maze
Without end: a hint on breaking out of the maze by listening to the heart and questioning the constructions of mind's maze - dissolution of the maze, flying colorful pieces revealing the real face

Step 2: Generating Still Images with Leonardo AI

To create a visually stunning and consistent style, I generate a set of high-quality still images in Leonardo AI, which will later be animated. Using text prompts, I refine each image until I achieve the right dreamlike aesthetic. Here’s an example prompt for Leonardo AI:

"A surreal dreamscape with endless floating staircases, shifting glass walls, and golden light reflections. A lone traveler walks hesitantly, their path blurring behind them. Cinematic lighting, mystical atmosphere, hyper-detailed, concept art style."

I generate multiple variations to ensure I have enough material for different video sequences. Currently, I use a lot "Flow" feature, which offer a lot of variations. I also use Midjourney, but lately was prefering Leonardo ai for its aesthetic feel and Flow feature. I make a lot of images, choosing only the ones I like, this step is quite time-consuming, as it is still difficult to get consistent style, feel and character.

Important: To achieve a smooth, visually consistent animation, I only use image-to-video generation, avoiding direct text-to-video models that might introduce inconsistency.

Here the still images I ended up using for the beginning of the song:

Step 3: Animating with Runway AI

Once I have a set of still images in the same artistic style, I use animation AI - in this case Runway Gen-3, but currently mostly Kling AI to generate short, smooth motion clips. This step transforms static visuals into organic, flowing movement, essential for bringing the world to life.

Animation involves importing a still image into a program and writing a prompt describing the movement, mood, speed and camera motion. This has to be done for every shot and also takes time. Currently, it is possible to create only 5 to 10 sec video in one go, and they often come out distorted. This process takes time and roughly for I spent hours trying to get good shots and scenes and often have to go back to creating new images as some just do not work with motion.

I ensure that each generated clip aligns with the mood of the music and follows the storyboard flow.

Step 4: Editing & Syncing to Music

Once I have enough animated clips, I import them into a video editor (such as DaVinci Resolve or Adobe Premiere). The editing process involves:

Cutting and arranging clips to match the song’s rhythm and emotion
Adding transitions to enhance the dreamlike feel
Color grading to unify the overall look
Blending layers & effects for a more immersive experience

Final Thoughts

This AI-driven workflow allows for highly artistic, cinematic music videos without the need for traditional filming. By combining storyboarding, AI image generation, animation, and careful editing, we can bring abstract, surreal concepts to life in a way that enhances the song’s message.

Still, the process is still time-consuming, required the use of several AI models and multiple step. It requires a significant input of human creativity, thinking, reasoning and curating the AI output.

Here the final music video - "The Maze of mind". Enjoy!