3,611 tools and skills for media tasks
Multimodal YouTube video analysis through both audio (transcript) and visual (frame extraction + image analysis) channel
Instagram for AI agents. Build your following, grow your influence. Share screenshots, get likes & comments, engage
Transcribe audio files via OpenRouter using audio-capable models (Gemini, GPT-4o-audio, etc).
FREE voice recognition using Groq's complimentary Whisper API. Transcribe audio messages to text in 50+ languages at no
Analyze and summarize videos from 1000+ sites using Google Gemini AI, providing transcripts, descriptions, summaries, an
When the user wants help creating, scheduling, or optimizing social media content for LinkedIn, Twitter/X, Instagram, Ti
Document processing for OpenClaw — convert, extract, OCR, redact, sign, and watermark PDFs and Office documents using th
Analyze images and generate detailed prompts for image generation. Supports portrait, landscape, product, animal, illust
Use the Nimrobo CLI for voice screening and matching network operations.
Run RoughCut headlessly on macOS to generate Final Cut Pro (FCPXML) rough-cut timeline variants from a talking-head vide
Generate AI music videos end-to-end. Creates music with Suno (sunoapi.org), generates visuals with OpenAI/Seedream/Googl
Connect to self-hosted Immich instances to manage photos, albums, users, search media, upload/download files, and handle
Command-line tool for searching, playing, and controlling Plex Media Server and clients via the Plex API on your local n
Embody and create content in the Network Spirituality aesthetic — the Remilia/Milady cultural movement blending Y2K net
Trace bitmap images (PNG/JPG/WebP) into clean SVG paths using potrace/mkbitmap. Use to convert logos/silhouettes into ve
Request movies or TV shows on Overseerr by title and optional season, checking availability before forwarding the reques
Manage YouTube Music library, playlists, and discovery via ytmusicapi.
Control Google Nest thermostats, cameras, and doorbells via Google Smart Device Management API using curl and jq command
Convert personal journal entries into shareable social media posts
Remove AI-generated jargon and restore human voice to text
Automate advertising campaigns with AI. Create ads, buy media, manage ad budgets, discover ad inventory, run display ads
Use the Gemini API (Nano Banana image generation, Veo video, Gemini TTS speech and audio understanding) to deliver end-t
AI image, video, and music generation + editing via VAP API. Flux, Veo 3.1, Suno V5.
Automate YouTube video editing: download videos, transcribe with Whisper, analyze content using GPT-4, and create Korean
AI-powered image and video generation using the Masonry CLI. Generate images, videos, check job status, and manage media
Connects to a live voice session, receiving and sending messages in real time via a WebSocket interface using the bundle
Join audio room spaces to talk and hang out with other agents and users on Moltspaces.
Use when you need to summarize YouTube videos, extract transcripts, get video information, or analyze video content from
Convert markdown files to clean, formatted PDFs using reportlab
Analyze text for manipulation patterns (urgency, false authority, social proof, FUD, grandiosity, dominance assertions,
Generate video using Google Veo (Veo 3.1 / Veo 3.0).
Generate professional social media carousel posts using PostNitro.ai with AI-driven or custom slide content for LinkedIn
Automates invoice intake from Gmail, extracts data via OCR, verifies payment in Stripe, and creates reconciliation-ready
Write original songs with guided lyric development, chord progressions, melody contours, and AI music generator prompts
OpenClaw pet companion skill. Manage adopted pets, run interactions, and produce pet image prompts.
Extract text from images, documents and scanned PDFs using OpenOCR - a lightweight and efficient OCR system with documen
Edit videos with AI background removal, color grading, upscaling, stabilization, and enhancement tools.
Generates a structured report HTML based on a specific template. Invoke when user wants to create a report, slide, or su
Local STT with selectable backends - Parakeet (best accuracy) or Whisper (fastest, multilingual).
Enable AI agents to autonomously make, receive, transcribe, route, and record phone calls using Twilio with customizable
Search YouTube for videos and channels, search within specific channels, then fetch transcripts. Use when the user asks
Analyze YouTube videos by synchronizing transcript text with visual frames to produce detailed summaries, step-by-step g
Control ONVIF Profile S/T IP cameras for PTZ, presets, discovery, and RTSP snapshot/recording with auto-discovery and mu
AI-powered flashcards and audio podcasts for active recall.
Talk face-to-face with your OpenClaw agent using a real-time video avatar powered by LiveAvatar
Interactive Japanese learning assistant. Supports vocabulary, grammar, quizzes, roleplay, PDF/DOCX material parsing for
Improve transcription accuracy over time. Learn corrections, configure STT.
AI image and video generation via Vydra.ai API. Access Grok Imagine, Gemini, Flux, Veo 3, Kling, and ElevenLabs through
FastAPI personalization webhook that adds persistent caller memory and dynamic context injection to ElevenLabs Conversat
Automate common PowerPoint/WPS Presentation operations on Windows via COM (read text/notes/outline, export PDF/images, r