Summarize: AI-powered content summarization tool
CLI and browser extension that generates summaries from URLs, files, videos, podcasts, and other media sources.
Learn more about Summarize
Summarize is a TypeScript-based tool that extracts and summarizes content from various sources including web pages, PDFs, audio/video files, YouTube videos, and podcast feeds. The system operates through both a command-line interface and browser extensions for Chrome and Firefox, communicating with a local daemon service for heavy processing tasks. It integrates with multiple AI providers including OpenAI-compatible endpoints, paid services, and free models through OpenRouter, supporting streaming output with real-time Markdown rendering. The tool handles media transcription using Whisper when published transcripts aren't available and includes specialized features like YouTube slide extraction with OCR capabilities.
Multi-Modal Processing
Handles diverse content types from web pages and PDFs to audio/video files and podcast feeds. Automatically detects media types and applies appropriate extraction methods including transcript processing and OCR for video slides.
Local Daemon Architecture
Uses a background service architecture that enables browser extensions to leverage heavy processing tools like yt-dlp, ffmpeg, and Tesseract OCR. The daemon runs locally on 127.0.0.1 with token-based authentication for security.
Streaming Interface
Provides real-time streaming output with live Markdown rendering, processing metrics, and cache-aware status updates. Includes chat functionality in the browser extension with conversation history and agent interactions.
// content-script.ts
interface SummaryRequest {
url: string;
content?: string;
mode: 'auto' | 'video-slides' | 'text-only';
}
class SummarizeContentScript {
private token: string | null = null;
private daemonUrl = 'http://127.0.0.1:3847';
async initialize() {
this.token = await chrome.storage.local.get(['daemonToken']);
this.observeNavigation();
}
private observeNavigation() {
let lastUrl = location.href;
new MutationObserver(() => {
const currentUrl = location.href;
if (currentUrl !== lastUrl) {
lastUrl = currentUrl;
this.handleNavigation(currentUrl);
}
}).observe(document, { subtree: true, childList: true });
}
async summarizePage(mode: SummaryRequest['mode'] = 'auto') {
const request: SummaryRequest = {
url: window.location.href,
content: this.extractPageContent(),
mode
};
return this.sendToDaemon('/api/summarize', request);
}
private async sendToDaemon(endpoint: string, data: any) {
const response = await fetch(`${this.daemonUrl}${endpoint}`, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': `Bearer ${this.token}`
},
body: JSON.stringify(data)
});
return response.json();
}
}v0.11.1: npm packaging: publish CLI with `pnpm publish` so `@steipete/summarize-core` is version-pinned in pu
- –npm packaging: publish CLI with `pnpm publish` so `@steipete/summarize-core` is version-pinned in published metadata (no `workspace:*` in registry package).
- –Auto CLI fallback: new controls and persisted last-success provider state (`~/.summarize/cli-state.json`) for no-key/local-CLI workflows.
- –Transcription reliability: Groq Whisper is now the preferred cloud transcriber, with custom OpenAI-compatible Whisper endpoint overrides.
- –Input reliability: binary-safe stdin handling, local media support in `--extract`, and fixes for local-file hangs/PDF preprocessing on custom OpenAI base URLs.
- –CLI: add Cursor Agent provider (`--cli agent`) for CLI-model execution.
v0.10.0: Chrome Side Panel: Chat mode with metrics bar, message queue, and improved context (full transcript
- –Chrome Side Panel: Chat mode with metrics bar, message queue, and improved context (full transcript + summary metadata, jump-to-latest).
- –Slides: YouTube slide screenshots + OCR + transcript-aligned cards, timestamped seek, and an OCR/Transcript toggle.
- –Media-aware summarization in the Side Panel: Page vs Video/Audio dropdown, automatic media preference on video sites, plus visible word count/duration.
- –CLI: robust URL + media extraction with transcript-first workflows and cache-aware streaming.
- –Slides: extract slide screenshots + OCR for YouTube/direct video URLs in the CLI + extension (#41, ).
v0.9.0: Chrome Side Panel: Chat mode with metrics bar, message queue, and improved context (full transcript
- –Chrome Side Panel: Chat mode with metrics bar, message queue, and improved context (full transcript + summary metadata, jump-to-latest, smoother auto-scroll).
- –Media-aware summarization in the Side Panel: Page vs Video/Audio dropdown, automatic media preference on video sites, plus visible word count/duration.
- –Chrome extension: optional hover tooltip summaries for links (advanced setting, default off; experimental) with prompt customization.
- –PDF + asset handling: send PDFs directly to Anthropic/OpenAI/Gemini when supported; generic PDF attachments and better media URL detection.
- –Daemon: `/v1/chat` + `extractOnly`, version in health/status pill, optional JSON log with rotation, and more resilient restart/install health checks.
See how people are using Summarize
Related Repositories
Discover similar tools and frameworks used by developers
tiktoken
Fast BPE tokenizer for OpenAI language models.
Mochi 1
10B parameter diffusion model for text-to-video generation using Asymmetric Diffusion Transformer.
LivePortrait
PyTorch implementation for animating portraits by transferring expressions from driving videos.
GroundingDINO
Zero-shot object detection from text prompts.
Optuna
Define-by-run Python framework for automated hyperparameter tuning.