voxblog/AI_GENERATION_FEATURE.md

# AI Generation Feature Documentation

## Overview

The AI Generation feature uses OpenAI's GPT-4 to automatically generate production-ready blog articles based on:
- Audio transcriptions from recorded clips
- Selected images from the media library
- User-provided AI prompts

## Features

### 1. **Intelligent Content Generation**
- Uses GPT-4o model with internet access capability
- Generates semantic HTML5 content ready for Ghost CMS
- Automatically inserts image placeholders where images should appear
- Respects user instructions and context from audio/images

### 2. **Image Placeholder System**
- AI generates placeholders in format: `{{IMAGE:description_of_image}}`
- Placeholders use snake_case descriptive names
- System tracks all placeholders for later replacement
- Placeholders are displayed to user for review

### 3. **Draft Persistence**
- Generated drafts are automatically saved to the post
- Drafts persist across sessions
- Users can manually edit generated content
- Re-generation is supported (overwrites previous draft)

### 4. **System Prompt Configuration**
- Default system prompt defines output format and requirements
- Can be overridden via API for custom generation rules
- Ensures consistent, production-ready HTML output

## Architecture

### Backend Components

#### 1. AI Generation API (`apps/api/src/ai-generate.ts`)
**Endpoints:**
- `POST /api/ai/generate` - Generate article content
- `GET /api/ai/system-prompt` - Get default system prompt

**Request Payload:**
```typescript
{
  prompt: string;                    // User's generation instructions
  audioTranscriptions?: string[];    // Transcribed audio clips
  selectedImageUrls?: string[];      // URLs of selected images
  systemPromptOverride?: string;     // Optional custom system prompt
}
```

**Response:**
```typescript
{
  content: string;                   // Generated HTML content
  imagePlaceholders: string[];       // Array of placeholder descriptions
  tokensUsed: number;                // OpenAI tokens consumed
  model: string;                     // Model used (gpt-4o)
}
```

#### 2. Database Schema Updates
**New Fields in `posts` table:**
- `generated_draft` (TEXT) - Stores the AI-generated HTML content
- `image_placeholders` (TEXT) - JSON array of image placeholder descriptions

**Migration:** `apps/api/drizzle/0002_soft_star_brand.sql`

#### 3. Posts API Updates
- GET `/api/posts/:id` now returns `generatedDraft` and `imagePlaceholders`
- POST `/api/posts` accepts and saves these fields
- JSON serialization/deserialization handled automatically

### Frontend Components

#### 1. AI Service (`apps/admin/src/services/ai.ts`)
```typescript
generateDraft(payload) => Promise<{
  content: string;
  imagePlaceholders: string[];
  tokensUsed: number;
  model: string;
}>

getSystemPrompt() => Promise<{ systemPrompt: string }>
```

#### 2. StepGenerate Component
**Features:**
- Display audio transcriptions in chronological order
- Show selected images
- AI prompt input field with placeholder example
- "Generate Draft" button (becomes "Re-generate Draft" after first use)
- Loading state with spinner during generation
- Error handling and display
- Generated content preview with HTML rendering
- Image placeholder detection and display
- Auto-save on generation

**Props:**
```typescript
{
  postClips: Clip[];
  genImageKeys: string[];
  onToggleGenImage: (key: string) => void;
  promptText: string;
  onChangePrompt: (v: string) => void;
  generatedDraft: string;
  imagePlaceholders: string[];
  onGeneratedDraft: (content: string) => void;
  onImagePlaceholders: (placeholders: string[]) => void;
}
```

#### 3. usePostEditor Hook Updates
**New State:**
- `generatedDraft` - Current generated content
- `imagePlaceholders` - Array of placeholder descriptions

**New Setters:**
- `setGeneratedDraft`
- `setImagePlaceholders`

**Persistence:**
- Loads from backend on post open
- Saves to backend when updated
- Included in savePost payload

## System Prompt

The default system prompt ensures:
1. Production-ready HTML output
2. Semantic HTML5 tags only
3. Consistent image placeholder format
4. No markdown or wrapper tags
5. SEO-friendly structure
6. Proper heading hierarchy
7. Valid, properly closed HTML

**Image Placeholder Format:**
```
{{IMAGE:description_of_image}}
```

Examples:
- `{{IMAGE:screenshot_of_dashboard}}`
- `{{IMAGE:team_photo_at_conference}}`
- `{{IMAGE:architecture_diagram}}`

## Usage Flow

1. **User records audio** in Step 1 (Assets)
   - Audio auto-uploads
   - User transcribes clips

2. **User selects images** in Step 1 (Assets)
   - Images marked for generation
   - Selection persists with post

3. **User writes AI prompt** in Step 2 (AI Prompt)
   - Describes article goals, audience, tone
   - References transcriptions and images

4. **User generates draft** in Step 3 (Generate)
   - Clicks "Generate Draft"
   - System sends prompt + transcriptions + image info to OpenAI
   - AI generates HTML with image placeholders
   - Draft auto-saves to post
   - Placeholders displayed for review

5. **User reviews/edits** in Step 4 (Edit)
   - Can manually edit generated content
   - Or return to Step 3 to re-generate

6. **Future: Image Replacement** (Next Step)
   - Replace placeholders with actual images
   - Match placeholder descriptions to selected images
   - Or allow manual image selection per placeholder

## Environment Variables Required

```bash
# .env file
OPENAI_API_KEY=sk-...your-key-here...
```

## Installation

### 1. Install OpenAI Package
```bash
cd apps/api
pnpm add openai
```

### 2. Run Database Migration
```bash
cd apps/api
pnpm drizzle:migrate
```

### 3. Set Environment Variable
Add `OPENAI_API_KEY` to your `.env` file in the project root.

### 4. Restart API Server
```bash
cd apps/api
pnpm run dev
```

## API Rate Limits & Costs

- Model: GPT-4o
- Typical article: ~2000-4000 tokens
- Cost: ~$0.01-0.04 per generation
- Rate limits: Per OpenAI account tier

## Error Handling

**Frontend:**
- Validates prompt is not empty
- Shows loading spinner during generation
- Displays error messages in Alert component
- Gracefully handles API failures

**Backend:**
- Validates required fields
- Checks for OpenAI API key
- Catches and logs OpenAI errors
- Returns descriptive error messages

## Future Enhancements

1. **Image Placeholder Replacement Step**
   - New step between Generate and Edit
   - Match placeholders to selected images
   - AI-assisted matching based on descriptions
   - Manual override capability

2. **System Prompt Customization**
   - UI for editing system prompt
   - Save custom prompts per user
   - Prompt templates library

3. **Generation History**
   - Save multiple generated versions
   - Compare versions side-by-side
   - Rollback to previous generation

4. **Advanced AI Features**
   - SEO optimization suggestions
   - Readability scoring
   - Tone adjustment
   - Length targeting

5. **Multi-Model Support**
   - Support for Claude, Gemini
   - Model selection in UI
   - Cost comparison

## Testing

### Manual Testing Checklist
- [ ] Generate draft with audio transcriptions
- [ ] Generate draft with selected images
- [ ] Generate draft with both audio and images
- [ ] Verify image placeholders are detected
- [ ] Verify draft persists after save
- [ ] Test re-generation (overwrites previous)
- [ ] Test error handling (invalid API key, network error)
- [ ] Verify HTML output is valid
- [ ] Test with empty prompt (should show error)
- [ ] Test with very long prompt

### Example Prompts

**Basic Article:**
```
Write a 1000-word technical article about building a modern blog platform with React and Ghost CMS. Include sections on architecture, key features, and deployment. Target audience: developers with React experience.
```

**With Context:**
```
Based on the audio transcriptions provided, write a comprehensive guide about the topics discussed. Structure it as a tutorial with clear steps. Include code examples where mentioned in the transcriptions. Use the selected images to illustrate key concepts.
```

**SEO-Focused:**
```
Write an SEO-optimized article about [topic]. Include an engaging introduction, 5-7 main sections with H2 headings, bullet points for key takeaways, and a strong conclusion with a call-to-action. Target keyword: [keyword]. Word count: 1500-2000 words.
```

## Troubleshooting

**Issue: "OpenAI API key not configured"**
- Solution: Add `OPENAI_API_KEY` to `.env` file

**Issue: Generation takes too long**
- Cause: Large prompts or transcriptions
- Solution: Reduce input size or increase timeout

**Issue: Invalid HTML output**
- Cause: System prompt not followed
- Solution: Review and adjust system prompt

**Issue: No image placeholders generated**
- Cause: Prompt doesn't mention images
- Solution: Explicitly request image placement in prompt

## Summary

The AI Generation feature provides a powerful, automated way to create production-ready blog content from audio recordings and images. The system is designed to be:
- **Reliable**: Robust error handling and validation
- **Flexible**: Customizable prompts and system configuration
- **Persistent**: All generated content is saved
- **User-Friendly**: Clear UI with loading states and error messages
- **Production-Ready**: Generates valid HTML for direct Ghost publishing

All components are implemented and ready for use. The next logical enhancement is the image placeholder replacement step.