Skip to content

Commit cb75db4

Browse files
QizotCopilot
andauthored
FCE-2755 Agent vision (#224)
## Description Describe your changes in detail --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
1 parent 8a7c577 commit cb75db4

2 files changed

Lines changed: 73 additions & 1 deletion

File tree

api/fishjam-server

docs/tutorials/agents.mdx

Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -206,6 +206,78 @@ You can interrupt the currently played audio chunk. See the example below.
206206

207207
</Tabs>
208208

209+
### Making the Agent see
210+
211+
Agents can also request video frames (JPEG images) from peers' video tracks.
212+
Unlike audio, which streams continuously, video frames must be explicitly requested and arrive asynchronously.
213+
214+
:::important
215+
Video frame capture is rate-limited to one frame per second per track.
216+
:::
217+
218+
<Tabs groupId="language">
219+
<TabItem value="ts" label="TypeScript">
220+
221+
```ts
222+
// @noErrors
223+
import { RoomId, FishjamClient, TrackId } from '@fishjam-cloud/js-server-sdk';
224+
225+
const fishjamId = '';
226+
const managementToken = '';
227+
const fishjamClient = new FishjamClient({ fishjamId, managementToken });
228+
const room = await fishjamClient.createRoom();
229+
const { agent } = await fishjamClient.createAgent(room.id, {});
230+
const trackId: TrackId = '' as TrackId;
231+
232+
// ---cut---
233+
import type { IncomingTrackImage } from '@fishjam-cloud/js-server-sdk';
234+
235+
// Listen for incoming video frames
236+
agent.on('trackImage', (message: IncomingTrackImage) => {
237+
const { contentType, data } = message;
238+
// process the image data
239+
});
240+
241+
// Request a frame periodically
242+
setInterval(() => {
243+
// [!code highlight:1]
244+
agent.captureImage(trackId);
245+
}, 1000);
246+
247+
```
248+
249+
</TabItem>
250+
251+
<TabItem value="python" label="Python">
252+
253+
```python
254+
import asyncio
255+
256+
from fishjam import FishjamClient
257+
from fishjam.agent import IncomingTrackImage
258+
259+
fishjam_client = FishjamClient(fishjam_id, management_token)
260+
261+
agent = fishjam_client.create_agent(room_id)
262+
263+
async with agent.connect() as session:
264+
# Request a frame
265+
# [!code highlight:1]
266+
await session.capture_image(track_id)
267+
268+
# Captured frames arrive as IncomingTrackImage messages
269+
async for message in session.receive():
270+
match message:
271+
case IncomingTrackImage() as msg if msg.track_id == track_id:
272+
data = msg.data
273+
# process the image data
274+
pass
275+
```
276+
277+
</TabItem>
278+
279+
</Tabs>
280+
209281
### Disconnecting
210282

211283
After you're done using an agent, you can disconnect it from the room.

0 commit comments

Comments
 (0)