voxblog/CONTENT_STATISTICS_PLAN.md
Ender b593ca35d5
All checks were successful
Deploy to Production / deploy (push) Successful in 1m47s
feat: update database credentials and add content statistics docs
- Changed database credentials in .env for improved security
- Added detailed implementation plan for content statistics feature (CONTENT_STATISTICS_PLAN.md)
- Created summary documentation for content statistics feature (CONTENT_STATISTICS_SUMMARY.md)
- Removed legacy MySQL root password and simplified database config variables
- Updated database name to use production naming convention (voxblog_prod)
2025-10-26 22:52:14 +01:00

11 KiB

Content Statistics Feature - Implementation Plan

Overview

Add comprehensive statistics display for generated articles in the StepGenerate component, showing metrics like word count, paragraph count, token count, reading time, and more.

Current State Analysis

Existing Code Structure

  • Component: apps/admin/src/components/steps/StepGenerate.tsx
  • Current Stats: Only shows tokenCount during streaming (line 236, 249)
  • Content Display: Two sections
    1. Live Generation (lines 256-284) - Shows streaming content
    2. Generated Draft (lines 288-336) - Shows final content
  • Data Available:
    • generatedDraft - HTML string of generated content
    • tokenCount - Number of tokens generated (streaming only)
    • streamingContent - Real-time content during generation
    • imagePlaceholders - Array of image placeholder strings
    • generationSources - Array of web sources used

Current Display Locations

  1. During streaming (line 248-250): Shows token count in caption
  2. After generation (line 291-301): Shows sources count
  3. After generation (line 303-314): Shows image placeholders count

Proposed Statistics

Core Metrics

  1. Word Count - Total words in article (excluding HTML tags)
  2. Character Count - Total characters (with/without spaces)
  3. Paragraph Count - Number of <p> tags
  4. Heading Count - Number of <h2>, <h3>, etc.
  5. List Item Count - Number of <li> tags
  6. Token Count - AI tokens generated (already available)
  7. Image Placeholder Count - Already shown, enhance display
  8. Reading Time - Estimated minutes (avg 200-250 words/min)

Advanced Metrics (Optional)

  1. Sentence Count - Approximate sentences
  2. Average Words per Paragraph - Content density
  3. Average Words per Sentence - Readability indicator
  4. Link Count - Number of <a> tags in content
  5. Generation Time - Time taken to generate (if available)

Implementation Plan

Phase 1: Create Statistics Utility Module

File: apps/admin/src/utils/contentStats.ts (new file)

export interface ContentStatistics {
  wordCount: number;
  characterCount: number;
  characterCountNoSpaces: number;
  paragraphCount: number;
  headingCount: number;
  listItemCount: number;
  sentenceCount: number;
  linkCount: number;
  readingTimeMinutes: number;
  avgWordsPerParagraph: number;
  avgWordsPerSentence: number;
}

export function calculateContentStats(htmlContent: string): ContentStatistics {
  // Implementation details below
}

Functions to implement:

  • stripHtmlTags(html: string): string - Remove all HTML tags
  • countWords(text: string): number - Count words
  • countParagraphs(html: string): number - Count <p> tags
  • countHeadings(html: string): number - Count <h1> to <h6> tags
  • countListItems(html: string): number - Count <li> tags
  • countSentences(text: string): number - Approximate sentence count
  • countLinks(html: string): number - Count <a> tags
  • calculateReadingTime(wordCount: number): number - Estimate reading time
  • calculateContentStats(htmlContent: string): ContentStatistics - Main function

Phase 2: Create Statistics Display Component

File: apps/admin/src/components/ContentStatistics.tsx (new file)

interface ContentStatisticsProps {
  htmlContent: string;
  tokenCount?: number;
  imagePlaceholderCount?: number;
  generationTimeMs?: number;
  variant?: 'compact' | 'detailed';
}

export default function ContentStatistics({
  htmlContent,
  tokenCount,
  imagePlaceholderCount,
  generationTimeMs,
  variant = 'detailed'
}: ContentStatisticsProps) {
  // Calculate stats using utility
  // Display in clean, organized format
}

Display Design:

  • Use Material-UI Paper or Alert component
  • Grid layout for metrics (2-3 columns on desktop, 1-2 on mobile)
  • Icons for each metric (optional)
  • Color-coded sections:
    • Primary metrics (word count, reading time) - prominent
    • Structure metrics (paragraphs, headings) - secondary
    • Technical metrics (tokens, generation time) - tertiary

Phase 3: Integrate into StepGenerate

File: apps/admin/src/components/steps/StepGenerate.tsx

Changes needed:

  1. Import new components:
import ContentStatistics from '../ContentStatistics';
import { calculateContentStats } from '../../utils/contentStats';
  1. Add statistics to "Live Generation" section (after line 280):
{/* Live stats during streaming */}
<ContentStatistics 
  htmlContent={streamingContent}
  tokenCount={tokenCount}
  variant="compact"
/>
  1. Add statistics to "Generated Draft" section (after line 315, before content preview):
{/* Final statistics */}
<ContentStatistics 
  htmlContent={generatedDraft}
  tokenCount={tokenCount}
  imagePlaceholderCount={imagePlaceholders.length}
  variant="detailed"
/>
  1. Optional: Add generation time tracking:
// Add state
const [generationStartTime, setGenerationStartTime] = useState<number>(0);
const [generationTimeMs, setGenerationTimeMs] = useState<number>(0);

// In onClick handler (line 169)
setGenerationStartTime(Date.now());

// In onDone callback (line 204)
setGenerationTimeMs(Date.now() - generationStartTime);

Phase 4: Mobile Optimization

Ensure responsive design:

  • Stack metrics vertically on mobile (xs breakpoint)
  • Use smaller font sizes on mobile
  • Collapse less important metrics on mobile
  • Use variant="compact" for live streaming on mobile

Phase 5: Testing & Polish

  1. Test with various content lengths (short, medium, long articles)
  2. Test with different HTML structures (headings, lists, links)
  3. Verify mobile responsiveness
  4. Add loading states if needed
  5. Add tooltips for metric explanations

Code Structure

File Organization

apps/admin/src/
├── components/
│   ├── ContentStatistics.tsx          # New component
│   └── steps/
│       └── StepGenerate.tsx            # Modified
└── utils/
    └── contentStats.ts                 # New utility module

Clean Code Principles

  1. Single Responsibility: Each function does one thing
  2. Pure Functions: Stats calculation has no side effects
  3. Reusable: Stats component can be used elsewhere
  4. Type Safe: Full TypeScript types
  5. Testable: Utility functions are easy to unit test
  6. Readable: Clear naming and documentation

Implementation Steps

Step 1: Create Utility Module

  • Create apps/admin/src/utils/contentStats.ts
  • Implement HTML parsing functions
  • Implement text analysis functions
  • Implement main calculateContentStats function
  • Add TypeScript interfaces
  • Add JSDoc comments

Step 2: Create Display Component

  • Create apps/admin/src/components/ContentStatistics.tsx
  • Design layout (grid/flex)
  • Add responsive breakpoints
  • Implement compact vs detailed variants
  • Add icons (optional)
  • Style with Material-UI theme

Step 3: Integrate into StepGenerate

  • Import new modules
  • Add to streaming section (compact variant)
  • Add to generated draft section (detailed variant)
  • Optional: Add generation time tracking
  • Test all scenarios

Step 4: Test & Refine

  • Test with real content
  • Verify mobile layout
  • Check performance (stats calculation should be fast)
  • Add error handling for edge cases
  • Update documentation

Example Output

Compact Variant (During Streaming)

📊 Live Stats: 342 words • 2 min read • 1,234 tokens • 8 paragraphs

Detailed Variant (After Generation)

┌─────────────────────────────────────────────────────┐
│ Content Statistics                                   │
├─────────────────────────────────────────────────────┤
│ 📝 Words: 1,234          ⏱️  Reading Time: 5 min    │
│ 🔤 Characters: 6,789     📄 Paragraphs: 15          │
│ 📑 Headings: 8           📋 List Items: 12          │
│ 🤖 Tokens: 1,567         🖼️  Images: 3              │
│ 🔗 Links: 5              ⚡ Generated in: 12.3s     │
└─────────────────────────────────────────────────────┘

Benefits

  1. User Insight: Writers see content metrics at a glance
  2. Quality Control: Identify too-short or too-long content
  3. SEO Awareness: Word count and reading time matter for SEO
  4. Content Planning: Helps plan article structure
  5. Performance Tracking: Token usage helps manage API costs
  6. Professional Feel: Adds polish to the editor

Technical Considerations

Performance

  • Stats calculation should be < 50ms for typical articles
  • Use memoization if needed (useMemo)
  • Don't recalculate on every render

Edge Cases

  • Empty content
  • Content with only HTML tags
  • Very long content (10k+ words)
  • Malformed HTML
  • Content with inline styles/scripts

Accessibility

  • Use semantic HTML
  • Add ARIA labels if needed
  • Ensure color contrast
  • Support keyboard navigation

Future Enhancements

  1. Export Stats: Download stats as JSON/CSV
  2. Historical Tracking: Compare stats across generations
  3. Target Metrics: Set word count goals
  4. SEO Score: Basic SEO analysis
  5. Readability Score: Flesch-Kincaid or similar
  6. Keyword Density: Track keyword usage
  7. Content Comparison: Compare before/after edits

Success Criteria

  • Stats display correctly for all content types
  • Mobile-responsive layout
  • Fast calculation (< 50ms)
  • Clean, maintainable code
  • No performance degradation
  • Helpful for content creators

Status: IMPLEMENTED - All phases complete! Actual Time: ~30 minutes Priority: Medium Complexity: Low-Medium

Implementation Summary

Files Created

  1. apps/admin/src/utils/contentStats.ts - Statistics calculation utility
  2. apps/admin/src/components/ContentStatistics.tsx - Display component

Files Modified

  1. apps/admin/src/components/steps/StepGenerate.tsx - Integrated statistics

Features Implemented

  • Word count, character count, reading time
  • Paragraph, heading, list item counts
  • Sentence count and averages
  • Token count display
  • Generation time tracking
  • Image placeholder count
  • Link count
  • Compact variant for live streaming
  • Detailed variant for final draft
  • Mobile-responsive grid layout
  • Performance optimized with useMemo