Skip to content

t1k:extended-multimodal

FieldValue
Modulet1k-extended
Version2.14.3
Effortunknown
Tools
/t1k:extended-multimodal
[file-path] [prompt]

Process audio, images, videos, documents using Gemini. Generate images, videos, speech, music via Gemini + MiniMax.

Terminal window
# 1. Install Python dependencies
pip install -r .claude/skills/t1k-extended-multimodal/scripts/requirements.txt
# 2. Set required API key
export GEMINI_API_KEY="your-key" # https://aistudio.google.com/apikey
# 3. Set optional MiniMax key (for image/video/speech/music generation)
export MINIMAX_API_KEY="your-key" # https://platform.minimax.io/user-center/basic-information/interface-key
# 4. (Optional) Install human-mcp for in-loop interactive use
claude mcp add human-mcp -- npx -y github:The1Studio/human-mcp#v2.15.1

Optional — OpenAI-compatible backend (LiteLLM proxy / model-router)

Section titled “Optional — OpenAI-compatible backend (LiteLLM proxy / model-router)”

Set GEMINI_OPENAI_BASE_URL to route analyze (vision) and generate (image) through an OpenAI-compatible gateway instead of the native Gemini API. GEMINI_API_KEY is reused as the bearer token. Only analyze+generate are supported this way — audio/video/transcribe stay on native Gemini.

Terminal window
# LiteLLM proxy — vision + image generation
export GEMINI_OPENAI_BASE_URL="https://litellm.athena.tools"
export GEMINI_API_KEY="sk-..." # proxy key
python scripts/gemini_batch_process.py --task generate --prompt "a green square" \
--model gemini-3.1-flash-image-preview --output out.png
python scripts/gemini_batch_process.py --files out.png --task analyze \
--prompt "what color?" --model gemini-3.1-pro-preview
# TheOneKit model-router (CCS) — vision ONLY (no image-gen route)
export GEMINI_OPENAI_BASE_URL="https://ccs.the1studio.org/api/provider/kimi"
export GEMINI_API_KEY="$(gh auth token)" # The1Studio org membership
python scripts/gemini_batch_process.py --files img.png --task analyze \
--prompt "describe this" --model gpt-5.4-mini # or kimi-k2.6

Verify setup: python scripts/check_setup.py Analyze media: python scripts/gemini_batch_process.py --files <file> --task <analyze|transcribe|extract> --help Generate (Gemini): python scripts/gemini_batch_process.py --task <generate|generate-video> --prompt "desc" Generate (MiniMax): python scripts/minimax_cli.py --task <generate|generate-video|generate-speech|generate-music> --prompt "desc" --help

ProviderTypeModelNotes
GeminiImage gengemini-3.1-flash-image-previewNano Banana 2 — DEFAULT
GeminiImage gengemini-3-pro-image-previewNano Banana Pro — production / 4K text
GeminiVideo genveo-3.1-generate-preview8s clips with audio
GeminiAnalysisgemini-2.5-flashRecommended
MiniMaxImage genimage-01$0.03/image
MiniMaxVideo genMiniMax-Hailuo-2.31080p
MiniMaxSpeech/TTSspeech-2.8-hd300+ voices, 40+ languages
MiniMaxMusicmusic-2.54-min songs with lyrics
  • gemini_batch_process.py: Gemini CLI — transcribe, analyze, extract, generate, generate-video
  • minimax_cli.py: MiniMax CLI — generate, generate-video, generate-speech, generate-music
  • minimax_generate.py: MiniMax generation library for programmatic use
  • minimax_api_client.py: MiniMax HTTP client, auth, async polling, file download
  • media_optimizer.py: ffmpeg/Pillow preflight — compress/resize/convert media before API calls
  • document_converter.py: Gemini-powered PDF/image/Office to markdown converter
  • check_setup.py: Setup checker for API keys and dependencies
TopicFile
Vision/OCR/Imagesreferences/vision-understanding.md
Image Generationreferences/image-generation.md
Video Analysisreferences/video-analysis.md
Video Generationreferences/video-generation.md
Audio/TTSreferences/audio-processing.md
Music Generationreferences/music-generation.md
MiniMax APIreferences/minimax-generation.md

Formats: Audio (WAV/MP3/AAC, 9.5h), Images (PNG/JPEG/WEBP, 3.6k), Video (MP4/MOV, 6h), PDF (1k pages) Size: 20MB inline, 2GB File API Transcription: Audio/video >15 min must be chunked (15 min max per chunk) to avoid truncation. Transcription format: [HH:MM:SS -> HH:MM:SS] content per segment, with metadata header.

Save outputs to plans/multimodal-outputs/<YYYYMMDD>/.