Building an Automated "Style Cloner": Reverse Engineering Images with GPT-4o & Legnext MJ API

This guide shows you how to build an automated Python pipeline that combines GPT-4o's visual analysis with the Legnext MJ API to achieve precise reverse engineering and replication of any image style.

Every creator in the AI image generation space has experienced this moment of frustration: You stumble upon a stunning image on Pinterest or ArtStation—perfect lighting, unique composition, and just the right film grain. You want to replicate this style, but when you try to write the prompt, you find yourself playing a "guessing game."

While Midjourney's built-in /describe command is a good starting point, it often results in a "word salad" of disjointed tags. It lacks a deep understanding of visual logic.

To achieve true automated "Style Cloning," we need the perfect collaboration between two core roles:

GPT-4o (The Connoisseur): Unlike traditional taggers, GPT-4 Vision reads images like a photographer. It understands Focal Length, Depth of Field (Bokeh), and specific art movements (e.g., Bauhaus, Cyberpunk).
Legnext MJ API (The Painter): Midjourney has no official API. Legnext bridges this gap, allowing us to programmatically execute the high-level instructions from GPT-4o with support for concurrency and webhooks.

The Tutorial: Building the Pipeline

Let's build this pipeline step-by-step using Python. We will break down the code into logical blocks.

Step 1: Configuration & Setup

First, we import the necessary libraries and set up our API keys. We also need a utility function to encode local images into Base64 format, which is required by GPT-4 Vision.

Python

import base64
import time
import requests

# --- Configuration ---
OPENAI_API_KEY = "sk-..."  # Replace with your OpenAI Key
LEGNEXT_API_KEY = "your_legnext_key" # Replace with your Legnext Key
LEGNEXT_BASE_URL = "https://api.legnext.ai/v1" 

def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode('utf-8')

Step 2: The "Eye" – Extracting Style with GPT-4o

This is the most critical part. We don't just ask GPT-4 to "describe" the image. We use a System Prompt to force it to act as a Prompt Engineer. We also strictly forbid conversational filler (e.g., "Here is the prompt") to ensure the output is clean.

Python

def get_style_prompt(image_path, new_subject):
    base64_image = encode_image(image_path)
    
    headers = {
        "Content-Type": "application/json",
        "Authorization": f"Bearer {OPENAI_API_KEY}"
    }

    # System Prompt: Force GPT to output ONLY the raw prompt string.
    system_instruction = """
    You are an expert Midjourney Prompt Engineer.
    Task: Analyze the visual style (lighting, medium, camera angle, color palette) of the provided image.
    Output: A single, optimized Midjourney prompt that applies this exact style to the user's requested subject.
    Constraint: Do NOT output conversational text like "Here is the prompt". Output ONLY the raw prompt string.
    """

    payload = {
        "model": "gpt-4o",
        "messages": [
            {"role": "system", "content": system_instruction},
            {"role": "user", "content": [
                # Inject the NEW subject into the analysis request
                {"type": "text", "text": f"Keep the style, but change the subject to: {new_subject}"},
                {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}}
            ]}
        ],
        "max_tokens": 200
    }

    try:
        response = requests.post("https://api.openai.com/v1/chat/completions", headers=headers, json=payload)
        response.raise_for_status()
        prompt = response.json()['choices'][0]['message']['content'].strip()
        print(f"✅ GPT-4 Generated Prompt: {prompt}")
        return prompt
    except Exception as e:
        print(f"❌ OpenAI API Error: {e}")
        return None

Step 3: The "Hand" – Triggering Generation

Once we have the prompt, we send it to Legnext. Note that this is an asynchronous operation. The API returns a taskId immediately, confirming the job has been queued.

Python

def trigger_generation(prompt):
    url = f"{LEGNEXT_BASE_URL}/imagine"
    headers = {
        "Authorization": f"Bearer {LEGNEXT_API_KEY}",
        "Content-Type": "application/json"
    }
    
    # Append version parameter for best quality
    payload = {
        "prompt": prompt + " --v 6.0" 
    }
    
    response = requests.post(url, headers=headers, json=payload)
    if response.status_code == 200:
        task_id = response.json().get('taskId') 
        print(f"🚀 Task Submitted. ID: {task_id}")
        return task_id
    else:
        print(f"❌ Legnext Trigger Error: {response.text}")
        return None

Step 4: The "Patience" – Polling for Results

Since Midjourney takes 30-60 seconds to generate an image, we need a loop to check the status periodically. This function prevents your script from crashing or assuming failure prematurely.

Python

def wait_for_result(task_id, max_attempts=30):
    url = f"{LEGNEXT_BASE_URL}/message/{task_id}" 
    headers = {"Authorization": f"Bearer {LEGNEXT_API_KEY}"}

    print("⏳ Waiting for Midjourney to generate...")
    
    for attempt in range(max_attempts):
        response = requests.get(url, headers=headers)
        data = response.json()
        status = data.get('status')
        
        if status == 'completed' or status == 'succeeded':
            image_url = data.get('imageUrl') 
            print(f"✨ Success! Image URL: {image_url}")
            return image_url
        elif status == 'failed':
            print("❌ Generation Failed.")
            return None
        
        # Wait 5 seconds before next check
        time.sleep(5)
    
    print("⚠️ Operation Timed Out.")
    return None

Step 5: Putting It All Together

Now, we chain the functions together in the main block.

Python

if __name__ == "__main__":
    ref_image = "reference_shoe.jpg" # Local reference image
    target_subject = "A futuristic cyberpunk robotic cat" # New subject

    # 1. Analyze Style
    prompt = get_style_prompt(ref_image, target_subject)

    # 2. Submit & Wait
    if prompt:
        task_id = trigger_generation(prompt)
        if task_id:
            wait_for_result(task_id)

Pro Tip: Achieving 100% Consistency

The code above relies on GPT-4's ability to describe a style using words. However, if you need pixel-perfect style consistency (keeping the exact texture or brush strokes), you should use the Midjourney Style Reference (--sref) parameter.

You can modify the payload in trigger_generation to include it:

Python

payload = {
    # Combine the GPT-4 description with the Style Reference URL
    "prompt": f"{prompt} --sref https://your-hosted-image.com/ref.jpg --sw 100 --v 6.0"
}

By combining GPT-4 (for subject understanding) and Legnext MJ API's sref support (for style locking), you create an unbeatable automated design workflow.

By bridging the visual understanding of LLMs with the generative power of the Legnext API, we are turning creativity into a scalable, reproducible, and programmable asset.

Whether you are an indie hacker building the next viral AI app, or an enterprise looking to streamline design workflows, the combination of Code + Vision + Generation is your unfair advantage.

Ready to build?

Get your API Key from Legnext and start creating your own AI tools.