3,611 tools and skills for media tasks
Forensic media triage with chain of custody. Use when receiving images, videos, audio, PDFs, or documents that need evid
Sends voice messages (audio) to Feishu chats using Duby TTS.
Seedance × CellCog. ByteDance's #1 video model meets the frontier of multi-agent coordination — CellCog orchestrates See
Extract PDF content to Markdown using MinerU API. Supports formulas, tables, OCR. Provides both local file and online UR
Transform chaotic voice memos into a searchable knowledge base with automatic organization, linking, and tag-based retri
Remove image backgrounds using the remove.bg API with API-key auth and transparent PNG output. Use when high-quality cut
Phone-capable AI voice agent for OpenClaw — Twilio + OpenAI Realtime SIP bridge with call log dashboard
Analyze images using NVIDIA Kimi K2.5 vision model via NVIDIA NIM API. Supports png, jpg, jpeg, webp.
Ton namespace for Netsnek e.U. audio and media processing tools. Handles audio transcription, format conversion, wavefor
Static 3D visualization utilities wrapping Rerun SDK for adding point clouds, trajectories, cameras, planes, and chessbo
3D visualization toolkit wrapping Pangolin viewer for real-time display of point clouds, trajectories, cameras, planes,
SE(3) rigid body transformation library for 3D rotation and translation operations. Use when working with robot poses, c
통합 견적서/세금계산서 생성기. 한국형 견적서(사업자등록번호, 부가세) + 프리랜서 인보이스(다국어, VAT). 거래처/품목 DB, PDF 출력, 자동 계산.
Control a real iPhone through macOS iPhone Mirroring — screenshot, tap, swipe, type, launch apps, record video, OCR, and
Automated appointment management for beauty salons, clinics, studios, and photo booths. Handles booking requests, calend
Video ad creation with exact platform-specific specs for TikTok, Instagram, YouTube, Facebook, LinkedIn. Covers dimensio
Open Graph and social sharing image design with platform specs, text placement, and branding. Covers OG meta tags, Twitt
YouTube thumbnail design with specific dimensions, contrast rules, and mobile preview optimization. Covers safe zones, t
Transcribe audio to text with Whisper models via inference.sh CLI. Models: Fast Whisper Large V3, Whisper V3 Large. Capa
Film and video storyboarding with shot vocabulary, continuity rules, and panel layout. Covers shot types, camera angles,
Create AI-powered podcasts with text-to-speech, music, and audio editing. Tools: Kokoro TTS, DIA TTS, Chatterbox, AI mus
Generate images, videos, icons, audio, and more using Freepik's AI API. Supports Mystic, Flux, Kling, Hailuo, Seedream,
Master prompt engineering for AI models: LLMs, image generators, video models. Techniques: chain-of-thought, few-shot, s
Create AI avatar and talking head videos with OmniHuman, Fabric, PixVerse via inference.sh CLI. Models: OmniHuman 1.5, O
AI voice generation, text-to-speech, and voice synthesis via inference.sh CLI. Models: Kokoro TTS, DIA, Chatterbox, Higg
Still-to-video conversion guide: model selection, motion prompting, and camera movement. Covers Wan 2.5 i2v, Seedance, F
Launch a smart teleprompter with mobile remote control for video recording. Use when the user wants to read scripts whil
Control a Linux X11 desktop by taking screenshots and moving/clicking/typing via xdotool + scrot.
Generate images using GLM-Image API. Use when the user wants to generate, create, or draw an image from a text prompt. T
控制小播鼠广播系统进行音频播放和广播通知。使用当用户需要向广播设备播放音频、设置音量、管理定时广播任务、或查看设备状态时。支持播放音频文件、URL播放、音量调节、设备管理、定时任务管理、文字转语音(TTS)广播等功能。Control xia
Product Hunt launch optimization with specific specs, timing, and gallery strategy. Covers taglines, gallery images, mak
Talking head video production with AI avatars, lipsync, and voiceover. Covers portrait requirements, audio quality, Omni
Twitter/X thread writing with hook tweets, thread structure, and engagement optimization. Covers tweet formatting, chara
Best practices and techniques for writing effective AI video generation prompts. Covers: Veo, Seedance, Wan, Grok, Kling
Convert text to natural speech with DIA TTS, Kokoro, Chatterbox, and more via inference.sh CLI. Models: DIA TTS (convers
Character consistency across AI-generated images with reference sheets and LoRA techniques. Covers turnaround views, exp
Build multi-step AI content creation pipelines combining image, video, audio, and text. Workflow examples: generate imag
App Store and Google Play screenshot creation with exact platform specs. Covers iOS/Android dimensions, gallery ordering
Book cover design with genre-specific conventions, typography rules, and AI image generation. Covers fiction and non-fic
Multi-speaker dialogue audio creation with Dia TTS. Covers speaker tags, emotion control, pacing, conversation flow, and
Explainer video production guide: scripting, voiceover, visuals, and assembly. Covers script formulas, pacing rules, sce
Logo design principles and AI image generation best practices for creating logos. Covers logo types, prompting technique
Upscale and enhance images with Real-ESRGAN, Thera, Topaz, FLUX Upscaler via inference.sh CLI. Models: Real-ESRGAN, Ther
Generate videos with Google Veo models via inference.sh CLI. Models: Veo 3.1, Veo 3.1 Fast, Veo 3, Veo 3 Fast, Veo 2. Ca
Remove backgrounds from images with BiRefNet via inference.sh CLI. Model: BiRefNet (high accuracy background removal). U
Generate professional AI product photography and commercial images. Models: FLUX, Imagen 3, Grok, Seedream for product s
Generate AI music and songs with Diffrythm, Tencent Song Generation via inference.sh CLI. Models: Diffrythm (fast song g
Generate AI images with FLUX, Gemini, Grok, Seedream, Reve and 50+ models via inference.sh CLI. Models: FLUX Dev LoRA, F
Access Fathom AI meeting recordings, transcripts, summaries, and action items via the Fathom API. Use when the user asks
The default web content reader for OpenClaw. Reads X (Twitter), Reddit, YouTube, and any webpage into clean Markdown — z