voxblog/AI_GENERATION_FEATURE.md

320 lines
9.2 KiB
Markdown

# AI Generation Feature Documentation
## Overview
The AI Generation feature uses OpenAI's GPT-4 to automatically generate production-ready blog articles based on:
- Audio transcriptions from recorded clips
- Selected images from the media library
- User-provided AI prompts
## Features
### 1. **Intelligent Content Generation**
- Uses GPT-4o model with internet access capability
- Generates semantic HTML5 content ready for Ghost CMS
- Automatically inserts image placeholders where images should appear
- Respects user instructions and context from audio/images
### 2. **Image Placeholder System**
- AI generates placeholders in format: `{{IMAGE:description_of_image}}`
- Placeholders use snake_case descriptive names
- System tracks all placeholders for later replacement
- Placeholders are displayed to user for review
### 3. **Draft Persistence**
- Generated drafts are automatically saved to the post
- Drafts persist across sessions
- Users can manually edit generated content
- Re-generation is supported (overwrites previous draft)
### 4. **System Prompt Configuration**
- Default system prompt defines output format and requirements
- Can be overridden via API for custom generation rules
- Ensures consistent, production-ready HTML output
## Architecture
### Backend Components
#### 1. AI Generation API (`apps/api/src/ai-generate.ts`)
**Endpoints:**
- `POST /api/ai/generate` - Generate article content
- `GET /api/ai/system-prompt` - Get default system prompt
**Request Payload:**
```typescript
{
prompt: string; // User's generation instructions
audioTranscriptions?: string[]; // Transcribed audio clips
selectedImageUrls?: string[]; // URLs of selected images
systemPromptOverride?: string; // Optional custom system prompt
}
```
**Response:**
```typescript
{
content: string; // Generated HTML content
imagePlaceholders: string[]; // Array of placeholder descriptions
tokensUsed: number; // OpenAI tokens consumed
model: string; // Model used (gpt-4o)
}
```
#### 2. Database Schema Updates
**New Fields in `posts` table:**
- `generated_draft` (TEXT) - Stores the AI-generated HTML content
- `image_placeholders` (TEXT) - JSON array of image placeholder descriptions
**Migration:** `apps/api/drizzle/0002_soft_star_brand.sql`
#### 3. Posts API Updates
- GET `/api/posts/:id` now returns `generatedDraft` and `imagePlaceholders`
- POST `/api/posts` accepts and saves these fields
- JSON serialization/deserialization handled automatically
### Frontend Components
#### 1. AI Service (`apps/admin/src/services/ai.ts`)
```typescript
generateDraft(payload) => Promise<{
content: string;
imagePlaceholders: string[];
tokensUsed: number;
model: string;
}>
getSystemPrompt() => Promise<{ systemPrompt: string }>
```
#### 2. StepGenerate Component
**Features:**
- Display audio transcriptions in chronological order
- Show selected images
- AI prompt input field with placeholder example
- "Generate Draft" button (becomes "Re-generate Draft" after first use)
- Loading state with spinner during generation
- Error handling and display
- Generated content preview with HTML rendering
- Image placeholder detection and display
- Auto-save on generation
**Props:**
```typescript
{
postClips: Clip[];
genImageKeys: string[];
onToggleGenImage: (key: string) => void;
promptText: string;
onChangePrompt: (v: string) => void;
generatedDraft: string;
imagePlaceholders: string[];
onGeneratedDraft: (content: string) => void;
onImagePlaceholders: (placeholders: string[]) => void;
}
```
#### 3. usePostEditor Hook Updates
**New State:**
- `generatedDraft` - Current generated content
- `imagePlaceholders` - Array of placeholder descriptions
**New Setters:**
- `setGeneratedDraft`
- `setImagePlaceholders`
**Persistence:**
- Loads from backend on post open
- Saves to backend when updated
- Included in savePost payload
## System Prompt
The default system prompt ensures:
1. Production-ready HTML output
2. Semantic HTML5 tags only
3. Consistent image placeholder format
4. No markdown or wrapper tags
5. SEO-friendly structure
6. Proper heading hierarchy
7. Valid, properly closed HTML
**Image Placeholder Format:**
```
{{IMAGE:description_of_image}}
```
Examples:
- `{{IMAGE:screenshot_of_dashboard}}`
- `{{IMAGE:team_photo_at_conference}}`
- `{{IMAGE:architecture_diagram}}`
## Usage Flow
1. **User records audio** in Step 1 (Assets)
- Audio auto-uploads
- User transcribes clips
2. **User selects images** in Step 1 (Assets)
- Images marked for generation
- Selection persists with post
3. **User writes AI prompt** in Step 2 (AI Prompt)
- Describes article goals, audience, tone
- References transcriptions and images
4. **User generates draft** in Step 3 (Generate)
- Clicks "Generate Draft"
- System sends prompt + transcriptions + image info to OpenAI
- AI generates HTML with image placeholders
- Draft auto-saves to post
- Placeholders displayed for review
5. **User reviews/edits** in Step 4 (Edit)
- Can manually edit generated content
- Or return to Step 3 to re-generate
6. **Future: Image Replacement** (Next Step)
- Replace placeholders with actual images
- Match placeholder descriptions to selected images
- Or allow manual image selection per placeholder
## Environment Variables Required
```bash
# .env file
OPENAI_API_KEY=sk-...your-key-here...
```
## Installation
### 1. Install OpenAI Package
```bash
cd apps/api
pnpm add openai
```
### 2. Run Database Migration
```bash
cd apps/api
pnpm drizzle:migrate
```
### 3. Set Environment Variable
Add `OPENAI_API_KEY` to your `.env` file in the project root.
### 4. Restart API Server
```bash
cd apps/api
pnpm run dev
```
## API Rate Limits & Costs
- Model: GPT-4o
- Typical article: ~2000-4000 tokens
- Cost: ~$0.01-0.04 per generation
- Rate limits: Per OpenAI account tier
## Error Handling
**Frontend:**
- Validates prompt is not empty
- Shows loading spinner during generation
- Displays error messages in Alert component
- Gracefully handles API failures
**Backend:**
- Validates required fields
- Checks for OpenAI API key
- Catches and logs OpenAI errors
- Returns descriptive error messages
## Future Enhancements
1. **Image Placeholder Replacement Step**
- New step between Generate and Edit
- Match placeholders to selected images
- AI-assisted matching based on descriptions
- Manual override capability
2. **System Prompt Customization**
- UI for editing system prompt
- Save custom prompts per user
- Prompt templates library
3. **Generation History**
- Save multiple generated versions
- Compare versions side-by-side
- Rollback to previous generation
4. **Advanced AI Features**
- SEO optimization suggestions
- Readability scoring
- Tone adjustment
- Length targeting
5. **Multi-Model Support**
- Support for Claude, Gemini
- Model selection in UI
- Cost comparison
## Testing
### Manual Testing Checklist
- [ ] Generate draft with audio transcriptions
- [ ] Generate draft with selected images
- [ ] Generate draft with both audio and images
- [ ] Verify image placeholders are detected
- [ ] Verify draft persists after save
- [ ] Test re-generation (overwrites previous)
- [ ] Test error handling (invalid API key, network error)
- [ ] Verify HTML output is valid
- [ ] Test with empty prompt (should show error)
- [ ] Test with very long prompt
### Example Prompts
**Basic Article:**
```
Write a 1000-word technical article about building a modern blog platform with React and Ghost CMS. Include sections on architecture, key features, and deployment. Target audience: developers with React experience.
```
**With Context:**
```
Based on the audio transcriptions provided, write a comprehensive guide about the topics discussed. Structure it as a tutorial with clear steps. Include code examples where mentioned in the transcriptions. Use the selected images to illustrate key concepts.
```
**SEO-Focused:**
```
Write an SEO-optimized article about [topic]. Include an engaging introduction, 5-7 main sections with H2 headings, bullet points for key takeaways, and a strong conclusion with a call-to-action. Target keyword: [keyword]. Word count: 1500-2000 words.
```
## Troubleshooting
**Issue: "OpenAI API key not configured"**
- Solution: Add `OPENAI_API_KEY` to `.env` file
**Issue: Generation takes too long**
- Cause: Large prompts or transcriptions
- Solution: Reduce input size or increase timeout
**Issue: Invalid HTML output**
- Cause: System prompt not followed
- Solution: Review and adjust system prompt
**Issue: No image placeholders generated**
- Cause: Prompt doesn't mention images
- Solution: Explicitly request image placement in prompt
## Summary
The AI Generation feature provides a powerful, automated way to create production-ready blog content from audio recordings and images. The system is designed to be:
- **Reliable**: Robust error handling and validation
- **Flexible**: Customizable prompts and system configuration
- **Persistent**: All generated content is saved
- **User-Friendly**: Clear UI with loading states and error messages
- **Production-Ready**: Generates valid HTML for direct Ghost publishing
All components are implemented and ready for use. The next logical enhancement is the image placeholder replacement step.