voxblog/CONTENT_STATISTICS_PLAN.md
Ender b593ca35d5
All checks were successful
Deploy to Production / deploy (push) Successful in 1m47s
feat: update database credentials and add content statistics docs
- Changed database credentials in .env for improved security
- Added detailed implementation plan for content statistics feature (CONTENT_STATISTICS_PLAN.md)
- Created summary documentation for content statistics feature (CONTENT_STATISTICS_SUMMARY.md)
- Removed legacy MySQL root password and simplified database config variables
- Updated database name to use production naming convention (voxblog_prod)
2025-10-26 22:52:14 +01:00

322 lines
11 KiB
Markdown

# Content Statistics Feature - Implementation Plan
## Overview
Add comprehensive statistics display for generated articles in the StepGenerate component, showing metrics like word count, paragraph count, token count, reading time, and more.
## Current State Analysis
### Existing Code Structure
- **Component**: `apps/admin/src/components/steps/StepGenerate.tsx`
- **Current Stats**: Only shows `tokenCount` during streaming (line 236, 249)
- **Content Display**: Two sections
1. **Live Generation** (lines 256-284) - Shows streaming content
2. **Generated Draft** (lines 288-336) - Shows final content
- **Data Available**:
- `generatedDraft` - HTML string of generated content
- `tokenCount` - Number of tokens generated (streaming only)
- `streamingContent` - Real-time content during generation
- `imagePlaceholders` - Array of image placeholder strings
- `generationSources` - Array of web sources used
### Current Display Locations
1. **During streaming** (line 248-250): Shows token count in caption
2. **After generation** (line 291-301): Shows sources count
3. **After generation** (line 303-314): Shows image placeholders count
## Proposed Statistics
### Core Metrics
1. **Word Count** - Total words in article (excluding HTML tags)
2. **Character Count** - Total characters (with/without spaces)
3. **Paragraph Count** - Number of `<p>` tags
4. **Heading Count** - Number of `<h2>`, `<h3>`, etc.
5. **List Item Count** - Number of `<li>` tags
6. **Token Count** - AI tokens generated (already available)
7. **Image Placeholder Count** - Already shown, enhance display
8. **Reading Time** - Estimated minutes (avg 200-250 words/min)
### Advanced Metrics (Optional)
9. **Sentence Count** - Approximate sentences
10. **Average Words per Paragraph** - Content density
11. **Average Words per Sentence** - Readability indicator
12. **Link Count** - Number of `<a>` tags in content
13. **Generation Time** - Time taken to generate (if available)
## Implementation Plan
### Phase 1: Create Statistics Utility Module ✅
**File**: `apps/admin/src/utils/contentStats.ts` (new file)
```typescript
export interface ContentStatistics {
wordCount: number;
characterCount: number;
characterCountNoSpaces: number;
paragraphCount: number;
headingCount: number;
listItemCount: number;
sentenceCount: number;
linkCount: number;
readingTimeMinutes: number;
avgWordsPerParagraph: number;
avgWordsPerSentence: number;
}
export function calculateContentStats(htmlContent: string): ContentStatistics {
// Implementation details below
}
```
**Functions to implement**:
- `stripHtmlTags(html: string): string` - Remove all HTML tags
- `countWords(text: string): number` - Count words
- `countParagraphs(html: string): number` - Count `<p>` tags
- `countHeadings(html: string): number` - Count `<h1>` to `<h6>` tags
- `countListItems(html: string): number` - Count `<li>` tags
- `countSentences(text: string): number` - Approximate sentence count
- `countLinks(html: string): number` - Count `<a>` tags
- `calculateReadingTime(wordCount: number): number` - Estimate reading time
- `calculateContentStats(htmlContent: string): ContentStatistics` - Main function
### Phase 2: Create Statistics Display Component ✅
**File**: `apps/admin/src/components/ContentStatistics.tsx` (new file)
```typescript
interface ContentStatisticsProps {
htmlContent: string;
tokenCount?: number;
imagePlaceholderCount?: number;
generationTimeMs?: number;
variant?: 'compact' | 'detailed';
}
export default function ContentStatistics({
htmlContent,
tokenCount,
imagePlaceholderCount,
generationTimeMs,
variant = 'detailed'
}: ContentStatisticsProps) {
// Calculate stats using utility
// Display in clean, organized format
}
```
**Display Design**:
- Use Material-UI `Paper` or `Alert` component
- Grid layout for metrics (2-3 columns on desktop, 1-2 on mobile)
- Icons for each metric (optional)
- Color-coded sections:
- **Primary metrics** (word count, reading time) - prominent
- **Structure metrics** (paragraphs, headings) - secondary
- **Technical metrics** (tokens, generation time) - tertiary
### Phase 3: Integrate into StepGenerate ✅
**File**: `apps/admin/src/components/steps/StepGenerate.tsx`
**Changes needed**:
1. **Import new components**:
```typescript
import ContentStatistics from '../ContentStatistics';
import { calculateContentStats } from '../../utils/contentStats';
```
2. **Add statistics to "Live Generation" section** (after line 280):
```typescript
{/* Live stats during streaming */}
<ContentStatistics
htmlContent={streamingContent}
tokenCount={tokenCount}
variant="compact"
/>
```
3. **Add statistics to "Generated Draft" section** (after line 315, before content preview):
```typescript
{/* Final statistics */}
<ContentStatistics
htmlContent={generatedDraft}
tokenCount={tokenCount}
imagePlaceholderCount={imagePlaceholders.length}
variant="detailed"
/>
```
4. **Optional: Add generation time tracking**:
```typescript
// Add state
const [generationStartTime, setGenerationStartTime] = useState<number>(0);
const [generationTimeMs, setGenerationTimeMs] = useState<number>(0);
// In onClick handler (line 169)
setGenerationStartTime(Date.now());
// In onDone callback (line 204)
setGenerationTimeMs(Date.now() - generationStartTime);
```
### Phase 4: Mobile Optimization ✅
**Ensure responsive design**:
- Stack metrics vertically on mobile (xs breakpoint)
- Use smaller font sizes on mobile
- Collapse less important metrics on mobile
- Use `variant="compact"` for live streaming on mobile
### Phase 5: Testing & Polish ✅
1. Test with various content lengths (short, medium, long articles)
2. Test with different HTML structures (headings, lists, links)
3. Verify mobile responsiveness
4. Add loading states if needed
5. Add tooltips for metric explanations
## Code Structure
### File Organization
```
apps/admin/src/
├── components/
│ ├── ContentStatistics.tsx # New component
│ └── steps/
│ └── StepGenerate.tsx # Modified
└── utils/
└── contentStats.ts # New utility module
```
### Clean Code Principles
1. **Single Responsibility**: Each function does one thing
2. **Pure Functions**: Stats calculation has no side effects
3. **Reusable**: Stats component can be used elsewhere
4. **Type Safe**: Full TypeScript types
5. **Testable**: Utility functions are easy to unit test
6. **Readable**: Clear naming and documentation
## Implementation Steps
### Step 1: Create Utility Module
- [ ] Create `apps/admin/src/utils/contentStats.ts`
- [ ] Implement HTML parsing functions
- [ ] Implement text analysis functions
- [ ] Implement main `calculateContentStats` function
- [ ] Add TypeScript interfaces
- [ ] Add JSDoc comments
### Step 2: Create Display Component
- [ ] Create `apps/admin/src/components/ContentStatistics.tsx`
- [ ] Design layout (grid/flex)
- [ ] Add responsive breakpoints
- [ ] Implement compact vs detailed variants
- [ ] Add icons (optional)
- [ ] Style with Material-UI theme
### Step 3: Integrate into StepGenerate
- [ ] Import new modules
- [ ] Add to streaming section (compact variant)
- [ ] Add to generated draft section (detailed variant)
- [ ] Optional: Add generation time tracking
- [ ] Test all scenarios
### Step 4: Test & Refine
- [ ] Test with real content
- [ ] Verify mobile layout
- [ ] Check performance (stats calculation should be fast)
- [ ] Add error handling for edge cases
- [ ] Update documentation
## Example Output
### Compact Variant (During Streaming)
```
📊 Live Stats: 342 words • 2 min read • 1,234 tokens • 8 paragraphs
```
### Detailed Variant (After Generation)
```
┌─────────────────────────────────────────────────────┐
│ Content Statistics │
├─────────────────────────────────────────────────────┤
│ 📝 Words: 1,234 ⏱️ Reading Time: 5 min │
│ 🔤 Characters: 6,789 📄 Paragraphs: 15 │
│ 📑 Headings: 8 📋 List Items: 12 │
│ 🤖 Tokens: 1,567 🖼️ Images: 3 │
│ 🔗 Links: 5 ⚡ Generated in: 12.3s │
└─────────────────────────────────────────────────────┘
```
## Benefits
1. **User Insight**: Writers see content metrics at a glance
2. **Quality Control**: Identify too-short or too-long content
3. **SEO Awareness**: Word count and reading time matter for SEO
4. **Content Planning**: Helps plan article structure
5. **Performance Tracking**: Token usage helps manage API costs
6. **Professional Feel**: Adds polish to the editor
## Technical Considerations
### Performance
- Stats calculation should be < 50ms for typical articles
- Use memoization if needed (useMemo)
- Don't recalculate on every render
### Edge Cases
- Empty content
- Content with only HTML tags
- Very long content (10k+ words)
- Malformed HTML
- Content with inline styles/scripts
### Accessibility
- Use semantic HTML
- Add ARIA labels if needed
- Ensure color contrast
- Support keyboard navigation
## Future Enhancements
1. **Export Stats**: Download stats as JSON/CSV
2. **Historical Tracking**: Compare stats across generations
3. **Target Metrics**: Set word count goals
4. **SEO Score**: Basic SEO analysis
5. **Readability Score**: Flesch-Kincaid or similar
6. **Keyword Density**: Track keyword usage
7. **Content Comparison**: Compare before/after edits
## Success Criteria
- Stats display correctly for all content types
- Mobile-responsive layout
- Fast calculation (< 50ms)
- Clean, maintainable code
- No performance degradation
- Helpful for content creators
---
**Status**: IMPLEMENTED - All phases complete!
**Actual Time**: ~30 minutes
**Priority**: Medium
**Complexity**: Low-Medium
## Implementation Summary
### Files Created
1. `apps/admin/src/utils/contentStats.ts` - Statistics calculation utility
2. `apps/admin/src/components/ContentStatistics.tsx` - Display component
### Files Modified
1. `apps/admin/src/components/steps/StepGenerate.tsx` - Integrated statistics
### Features Implemented
- Word count, character count, reading time
- Paragraph, heading, list item counts
- Sentence count and averages
- Token count display
- Generation time tracking
- Image placeholder count
- Link count
- Compact variant for live streaming
- Detailed variant for final draft
- Mobile-responsive grid layout
- Performance optimized with useMemo