voxblog/AI_GENERATION_FEATURE.md

9.2 KiB

AI Generation Feature Documentation

Overview

The AI Generation feature uses OpenAI's GPT-4 to automatically generate production-ready blog articles based on:

  • Audio transcriptions from recorded clips
  • Selected images from the media library
  • User-provided AI prompts

Features

1. Intelligent Content Generation

  • Uses GPT-4o model with internet access capability
  • Generates semantic HTML5 content ready for Ghost CMS
  • Automatically inserts image placeholders where images should appear
  • Respects user instructions and context from audio/images

2. Image Placeholder System

  • AI generates placeholders in format: {{IMAGE:description_of_image}}
  • Placeholders use snake_case descriptive names
  • System tracks all placeholders for later replacement
  • Placeholders are displayed to user for review

3. Draft Persistence

  • Generated drafts are automatically saved to the post
  • Drafts persist across sessions
  • Users can manually edit generated content
  • Re-generation is supported (overwrites previous draft)

4. System Prompt Configuration

  • Default system prompt defines output format and requirements
  • Can be overridden via API for custom generation rules
  • Ensures consistent, production-ready HTML output

Architecture

Backend Components

1. AI Generation API (apps/api/src/ai-generate.ts)

Endpoints:

  • POST /api/ai/generate - Generate article content
  • GET /api/ai/system-prompt - Get default system prompt

Request Payload:

{
  prompt: string;                    // User's generation instructions
  audioTranscriptions?: string[];    // Transcribed audio clips
  selectedImageUrls?: string[];      // URLs of selected images
  systemPromptOverride?: string;     // Optional custom system prompt
}

Response:

{
  content: string;                   // Generated HTML content
  imagePlaceholders: string[];       // Array of placeholder descriptions
  tokensUsed: number;                // OpenAI tokens consumed
  model: string;                     // Model used (gpt-4o)
}

2. Database Schema Updates

New Fields in posts table:

  • generated_draft (TEXT) - Stores the AI-generated HTML content
  • image_placeholders (TEXT) - JSON array of image placeholder descriptions

Migration: apps/api/drizzle/0002_soft_star_brand.sql

3. Posts API Updates

  • GET /api/posts/:id now returns generatedDraft and imagePlaceholders
  • POST /api/posts accepts and saves these fields
  • JSON serialization/deserialization handled automatically

Frontend Components

1. AI Service (apps/admin/src/services/ai.ts)

generateDraft(payload) => Promise<{
  content: string;
  imagePlaceholders: string[];
  tokensUsed: number;
  model: string;
}>

getSystemPrompt() => Promise<{ systemPrompt: string }>

2. StepGenerate Component

Features:

  • Display audio transcriptions in chronological order
  • Show selected images
  • AI prompt input field with placeholder example
  • "Generate Draft" button (becomes "Re-generate Draft" after first use)
  • Loading state with spinner during generation
  • Error handling and display
  • Generated content preview with HTML rendering
  • Image placeholder detection and display
  • Auto-save on generation

Props:

{
  postClips: Clip[];
  genImageKeys: string[];
  onToggleGenImage: (key: string) => void;
  promptText: string;
  onChangePrompt: (v: string) => void;
  generatedDraft: string;
  imagePlaceholders: string[];
  onGeneratedDraft: (content: string) => void;
  onImagePlaceholders: (placeholders: string[]) => void;
}

3. usePostEditor Hook Updates

New State:

  • generatedDraft - Current generated content
  • imagePlaceholders - Array of placeholder descriptions

New Setters:

  • setGeneratedDraft
  • setImagePlaceholders

Persistence:

  • Loads from backend on post open
  • Saves to backend when updated
  • Included in savePost payload

System Prompt

The default system prompt ensures:

  1. Production-ready HTML output
  2. Semantic HTML5 tags only
  3. Consistent image placeholder format
  4. No markdown or wrapper tags
  5. SEO-friendly structure
  6. Proper heading hierarchy
  7. Valid, properly closed HTML

Image Placeholder Format:

{{IMAGE:description_of_image}}

Examples:

  • {{IMAGE:screenshot_of_dashboard}}
  • {{IMAGE:team_photo_at_conference}}
  • {{IMAGE:architecture_diagram}}

Usage Flow

  1. User records audio in Step 1 (Assets)

    • Audio auto-uploads
    • User transcribes clips
  2. User selects images in Step 1 (Assets)

    • Images marked for generation
    • Selection persists with post
  3. User writes AI prompt in Step 2 (AI Prompt)

    • Describes article goals, audience, tone
    • References transcriptions and images
  4. User generates draft in Step 3 (Generate)

    • Clicks "Generate Draft"
    • System sends prompt + transcriptions + image info to OpenAI
    • AI generates HTML with image placeholders
    • Draft auto-saves to post
    • Placeholders displayed for review
  5. User reviews/edits in Step 4 (Edit)

    • Can manually edit generated content
    • Or return to Step 3 to re-generate
  6. Future: Image Replacement (Next Step)

    • Replace placeholders with actual images
    • Match placeholder descriptions to selected images
    • Or allow manual image selection per placeholder

Environment Variables Required

# .env file
OPENAI_API_KEY=sk-...your-key-here...

Installation

1. Install OpenAI Package

cd apps/api
pnpm add openai

2. Run Database Migration

cd apps/api
pnpm drizzle:migrate

3. Set Environment Variable

Add OPENAI_API_KEY to your .env file in the project root.

4. Restart API Server

cd apps/api
pnpm run dev

API Rate Limits & Costs

  • Model: GPT-4o
  • Typical article: ~2000-4000 tokens
  • Cost: ~$0.01-0.04 per generation
  • Rate limits: Per OpenAI account tier

Error Handling

Frontend:

  • Validates prompt is not empty
  • Shows loading spinner during generation
  • Displays error messages in Alert component
  • Gracefully handles API failures

Backend:

  • Validates required fields
  • Checks for OpenAI API key
  • Catches and logs OpenAI errors
  • Returns descriptive error messages

Future Enhancements

  1. Image Placeholder Replacement Step

    • New step between Generate and Edit
    • Match placeholders to selected images
    • AI-assisted matching based on descriptions
    • Manual override capability
  2. System Prompt Customization

    • UI for editing system prompt
    • Save custom prompts per user
    • Prompt templates library
  3. Generation History

    • Save multiple generated versions
    • Compare versions side-by-side
    • Rollback to previous generation
  4. Advanced AI Features

    • SEO optimization suggestions
    • Readability scoring
    • Tone adjustment
    • Length targeting
  5. Multi-Model Support

    • Support for Claude, Gemini
    • Model selection in UI
    • Cost comparison

Testing

Manual Testing Checklist

  • Generate draft with audio transcriptions
  • Generate draft with selected images
  • Generate draft with both audio and images
  • Verify image placeholders are detected
  • Verify draft persists after save
  • Test re-generation (overwrites previous)
  • Test error handling (invalid API key, network error)
  • Verify HTML output is valid
  • Test with empty prompt (should show error)
  • Test with very long prompt

Example Prompts

Basic Article:

Write a 1000-word technical article about building a modern blog platform with React and Ghost CMS. Include sections on architecture, key features, and deployment. Target audience: developers with React experience.

With Context:

Based on the audio transcriptions provided, write a comprehensive guide about the topics discussed. Structure it as a tutorial with clear steps. Include code examples where mentioned in the transcriptions. Use the selected images to illustrate key concepts.

SEO-Focused:

Write an SEO-optimized article about [topic]. Include an engaging introduction, 5-7 main sections with H2 headings, bullet points for key takeaways, and a strong conclusion with a call-to-action. Target keyword: [keyword]. Word count: 1500-2000 words.

Troubleshooting

Issue: "OpenAI API key not configured"

  • Solution: Add OPENAI_API_KEY to .env file

Issue: Generation takes too long

  • Cause: Large prompts or transcriptions
  • Solution: Reduce input size or increase timeout

Issue: Invalid HTML output

  • Cause: System prompt not followed
  • Solution: Review and adjust system prompt

Issue: No image placeholders generated

  • Cause: Prompt doesn't mention images
  • Solution: Explicitly request image placement in prompt

Summary

The AI Generation feature provides a powerful, automated way to create production-ready blog content from audio recordings and images. The system is designed to be:

  • Reliable: Robust error handling and validation
  • Flexible: Customizable prompts and system configuration
  • Persistent: All generated content is saved
  • User-Friendly: Clear UI with loading states and error messages
  • Production-Ready: Generates valid HTML for direct Ghost publishing

All components are implemented and ready for use. The next logical enhancement is the image placeholder replacement step.