Uploading a Recording
How to hand a meeting recording to your AI employee — direct upload, URL, or pointing at a file that already lives on the server.
Supported Recording Sources
Sarudo accepts recordings from anywhere yt-dlp can reach — Zoom, Google Meet, Microsoft Teams, YouTube, Loom, direct HTTPS links, and most public video hosts. For audio-only or video-with-audio files, common formats are all supported (MP3, M4A, WAV, OGG, Opus, WebM, MP4, and the rest of what yt-dlp extracts). The actual transcription step runs on the extracted audio track, so video files have their audio pulled out first. If you have a recording that is behind authentication (for example a private Zoom cloud recording), download it yourself first and send the file, rather than the URL — yt-dlp does not log into your accounts.
Audio downloads are capped at a 5-minute download timeout. If a remote recording is unusually large or slow, download it to your own machine and upload the file directly instead.
How to Hand It Over
The easiest way is to forward a recording link in Telegram and tell your AI employee what it is. "Here is the kickoff with ClientCo — please process this: https://zoom.us/rec/share/…". The AI downloads the audio, transcribes it, extracts action items, and stores the result. You can also attach an audio or video file directly to the Telegram message, or reference a file that has already been saved to your server (useful if your AI employee has just recorded or downloaded something in an earlier step).
Processing a Zoom recording
Hand a recording URL to Sarudo and get back a full analysis.
Titling and Linking
When you hand over a recording, you can provide a meeting title and link it to a calendar event or CRM contacts in the same breath. Titles make the meeting easier to find later ("show me the ClientCo kickoff transcript"). Calendar event IDs tie the meeting record back to the original scheduled event, useful when you want to see the meeting output alongside the invite. Contact IDs determine who gets a "Meeting" activity logged in their CRM record, so the transcript and action items show up in that person's timeline.
If you do not provide contact IDs, no CRM activities are created — the meeting is stored in the meetings table but is not linked to anyone. You can always go back later and link attendees manually.
Language and Speaker Detection
Language is auto-detected by default, so you rarely need to specify it. If the recording is short or the audio is noisy, detection can pick the wrong language — in that case, tell the AI explicitly ("this recording is in Spanish"). Speaker detection is on by default and works heuristically: long silence gaps and question-answer patterns are used to label up to eight speakers (Speaker 1, Speaker 2, etc.). The heuristic is not biometric — two people with similar speaking patterns or short back-and-forth turns may get merged. You can disable speaker detection for solo recordings (a personal voice memo, a monologue, a webinar) by saying so when you upload.