Skip to main content
Version: v1.2

Video Caption

Use this API to analyze video content and automatically generate captions or summaries. It also supports human re-identification (ReID) by matching people in the video against reference images you provide.

Prerequisites

  • You’re familiar with the concepts described on the Platform overview page.
  • You have created a memories.ai API key.
  • The videos must meet the following requirements:
    • File size: Maximum 20 MB
    • Duration: Between 20–300 seconds
    • Optional ReID: Up to 5 reference person images can be provided

Host URL

  • https://security.memories.ai

Endpoints

  • POST /v1/understand/upload – Upload video by URL
  • POST /v1/understand/uploadFile – Upload video by local file (multipart form)

Request Example (Upload by URL)

import requests, json

url = "https://security.memories.ai/v1/understand/upload"
headers = {"Authorization": "<API_KEY>"}

json_body = {
"video_url": "https://example.com/test_video.mp4",
"user_prompt": "Summarize the video content and identify key persons",
"system_prompt": "You are a video understanding system that analyzes content and detects persons.",
"callback": "https://yourserver.com/callback",
"persons": [
{"name": "Alice", "url": "https://example.com/alice.jpg"},
{"name": "Bob", "url": "https://example.com/bob.jpg"}
],
"thinking": False
}

response = requests.post(url, headers=headers, json=json_body)
print(response.json())

Replace the following placeholders:

  • API_KEY: Your actual memories.ai API key.
  • video_url: Publicly accessible video URL (if using URL method).
  • callback: A publicly accessible endpoint to receive analysis results.
  • persons: Optional list of people to re-identify in the video.

Request Example (Upload by Local File)

import requests, json

url = "https://security.memories.ai/v1/understand/uploadFile"
headers = {"Authorization": "<API_KEY>"}

# JSON request body
data = {
"user_prompt": "Summarize the video content and identify key persons",
"system_prompt": "You are a video understanding system that summarizes video content.",
"callback": "https://yourserver.com/callback",
"persons": [
{"name": "Alice"},
{"name": "Bob"}
]
}

files = [
("req", ("req.json", json.dumps(data), "application/json")),
("files", ("video.mp4", open("video.mp4", "rb"), "video/mp4")),
("files", ("alice.png", open("alice.png", "rb"), "image/png")),
("files", ("bob.png", open("bob.png", "rb"), "image/png"))
]

response = requests.post(url, files=files, headers=headers)
print(response.json())

Callback Notification Payload

When processing is complete, results will be sent asynchronously to the callback URL you provide.

{
"status": 0,
"task_id": "8e03075a-2230-4e67-98d4-ba53f37c807a",
"data": {
"text": "A man enters the room and greets two colleagues. Alice is identified by the reference image.",
"token": {
"input": 123,
"output": 456,
"total": 579
}
}
}

The callback request body includes:

  • status: 0 = success, -1 = failure
  • task_id: Unique task identifier
  • data.text: Generated caption or summary text
  • data.token: Token usage statistics (input/output/total)

Request Body (URL Method)

video_url: "<VIDEO_URL>"
user_prompt: "<USER_PROMPT>"
system_prompt: "<SYSTEM_PROMPT>"
callback: "<CALLBACK_URL>"
persons:
- name: "<NAME>"
url: "<IMAGE_URL>"
thinking: false

Request Parameters

NameLocationTypeRequiredDescription
video_urlbodystringNoPublic video URL (use local file method if not provided)
user_promptbodystringYesInstruction for video analysis
system_promptbodystringYesRole/context for the analysis system
callbackbodystringYesCallback URL for asynchronous results
personsbodyarrayNoPerson descriptors for human re-identification
thinkingbodybooleanNoEnable/disable reasoning mode
AuthorizationheaderstringYesYour API key

Response Example

Status code 200

{
"code": 0,
"msg": "success",
"data": {
"task_id": "xxx"
}
}

Response Structure

NameTypeRequiredDescription
codeintYesStatus code (0 for success, -1 for failure)
msgstringYesMessage text
dataobjectYesResponse data
» task_idstringYesUnique task ID for tracking

Note: The callback field must be provided and reachable. The final captioning or analysis result is only delivered asynchronously.