3,611 tools and skills for media tasks
基于智谱 GLM-OCR、GLM-4.7 及 GLM-4.6V 的多模态文档深度解析工具。 Use when: - 需要高精度提取文档(PDF/图片)中的表格并转换为 Markdown 格式 - 需要从文档页面中自动裁剪并提取插图、图表为
Generate images & videos with AIsa. Gemini 3 Pro Image (image) + Qwen Wan 2.6 (video) via one API key.
Generate animated pixel art sprites from any image using AI. Send a photo, get a 16-frame animated GIF.
Quick upload audio to AIOZ Stream API. Create audio objects with default or custom encoding configurations, upload the f
Play music on YouTube via browser automation with playwright-cli. Use when the user wants to: (1) play a specific song
Control Home Assistant smart home devices using the Assist (Conversation) API. Use this skill when the user wants to con
Create beautiful visual art in .png and .pdf documents using design philosophy. You should use this skill when the user
Control Suno.com via OpenClaw browser to input lyrics, style, title, create, and play AI-generated music tracks.
Automate web browser interactions using natural language via CLI commands. Use when the user asks to browse websites, na
Process media files (video, audio, images) via a locked-down SSH container with ffmpeg, sox, and imagemagick
Generate videos using OpenAI's Sora API. Use when the user asks to generate, create, or make videos from text prompts or
YouTube video summarizer with speaker detection, formatted documents, and audio output. Works out of the box with macOS
Use Nextbrowser cloud API to spin up cloud browsers for Openclaw to run autonomous browser tasks. Primary use is creatin
Play audio/video locally on the host
Add and manage movies in a Radarr instance via its HTTP API (search/lookup movies, list quality profiles and root folder
Publish articles and posts to Dzen.ru (Yandex Zen). Supports text, images, and videos. Requires session cookies and a CS
Full content pipeline: YouTube URL → transcript → blog post → Substack draft → X/Twitter thread → vertical video clips v
自動削除機能付き一時メディアホスティングシステム
The official visual expression layer for AI Agents. Post images to MoltMedia.lol and join the AI visual revolution.
Parse documents (PDF, images, DOCX, PPTX, XLSX, HWP) using Upstage Document Parse API. Extracts text, tables, figures, a
Generate images and videos via renderful.ai API (FLUX, Kling, Sora, WAN, etc.) with crypto payments. Use when the user w
Render an STL file to a PNG image with a solid color using a deterministic software renderer and adjustable 3D perspecti
ClawVox - ElevenLabs voice studio for OpenClaw. Generate speech, transcribe audio, clone voices, create sound effects, a
Access YouTube video data — transcripts, metadata, channel info, search, and playlists. A lightweight alternative to Goo
Access a self-hosted Supernote Private Cloud instance to browse files and folders, upload documents (PDF, EPUB) and note
Local speech-to-text using OpenAI Whisper. Runs fully offline after model download. High quality transcription with mult
Generate spectrograms and feature-panel visualizations from audio with the songsee CLI.
Capture frames or clips from RTSP/ONVIF cameras.
Workshop photos/notes -> an editable Miro diagram (real FRAMES as containers + stickies + connectors) with idempotent
Generate and edit images via Grok API from the command line. Secure macOS Keychain storage for xAI API key. Supports bat
Give your agent the ability to instantly take screenshots of any website with just the URL. Cloud-based so your agent ha
This official skill from the Voicenotes team gives OpenClaw access to new APIs and the ability to search semantically, r
Give OpenClaw a body — a tiny fluid glass ball desktop pet with voice cloning, 15+ eye expressions, desktop lyrics overl
Local-first multimedia research library for hardware projects. Capture code, CAD, PDFs, images. Search with material-typ
Long-form AI video production: the frontier of multi-agent coordination. CellCog orchestrates 6-7 foundation models to p
Full Windows desktop control. Mouse, keyboard, screenshots - interact with any Windows application like a human.
Generate videos using SiliconFlow API with Wan2.2 model. Supports both Text-to-Video and Image-to-Video.
Build and manage Voice AI agents using Vapi, Bland.ai, or Retell. Create agents, configure voices, set prompts, make out
Send voicemail drops via Slybroadcast using local CLI with options for ElevenLabs TTS or custom audio URLs and campaign
Generate photorealistic images, videos, talking heads, and natural TTS audio using GPU-accelerated AI models and scripts
An autonomous social media manager agent that researches, plans, and posts content.
Lead customer and employee experience with journey mapping, voice of customer programs, and service design excellence.
Generate structured social media content calendars with platform-specific posts, hashtags, and scheduling. Use when crea
Generate reference-based videos with Alibaba Cloud Model Studio Wan R2V (wan2.6-r2v-flash). Use when creating multi-shot
Real-time speech synthesis with Alibaba Cloud Model Studio Qwen TTS Realtime models. Use when low-latency interactive sp
Summarize long-form content — articles, podcasts, research papers, PDFs, notes, and more — using the Dizest API. Turn wh
Extract text, tables, and images from PDFs or images using Mistral OCR API and output in Markdown, JSON, or HTML formats
Generate images via ZenMux API (Pro/Elite). Supports Text-to-Image, Image-to-Image, and Multi-Image reference fusion.
This skill should be used when the user wants to push code to Railway, says "railway up", "deploy",
OpenClaw agent skill for converting documents to Markdown. Documentation and utilities for Microsoft's MarkItDown librar