Generate

This method holds the main functionality of the API allowing users to generate AI avatar videos.

This endpoint comes in two flavors, a JSON API and a Multipart Form based API. The JSON API is for when no audio is provided and instead a Text-to-Speech (TTS) engine is used to generate the audio. The Multipart Form based API is for instances when you have an audio file already to generate AI talking avatar videos for.

As request time may vary a lot depending on the length of audio being generated and limitations with rendering engines, user requests are added to a queue and processed in turn when resources are freed up. This means that under heavy load the time to generate a video could increase dramatically.

Given the asynchronous nature of the API we provide web-hooks for when a request leaves the queue and starts being processed, if it fails to be processed, and when it finishes processing successfully. They also contain a job ID which is a UUID Emotech employees can use to find logs and diagnose any issues you find. Advanced users who are able to write and deploy a service to handle these web-hook requests may want to make use of these to optimize their own content generation pipelines. Other users can ignore these details.

The JSON and Multipart APIs both require the same json object sent to them referred to as the "Job Specification JSON Object". Refer to the section after the API tables for a description of this object.

Generate avatar videos.

POST https://lipsync-ai.api.emotechlab.com/lipsync/generate

Query Parameters

Name

Type

Description

token*

String

Your user token tied to your account. This is used to validate your identity.

Headers

Name

Type

Description

content-type*

String

application/json

Request Body

Name

Type

Description

job*

String

Job specification JSON Object

Returns a UUID for the rendering job to allow the client to check its status in future. It also returns the job status which will currently always be queued.

{
    "id":"314fb342-e7c0-474b-aab2-994d155f2062",
    "status":"queued"
}

Generate avatar videos (Multipart).

POST https://lipsync-ai.emotechlab.com/lipsync/generate

Query Parameters

Name

Type

Description

token*

String

Your user token tied to your account is used to validate your identity.

Headers

Name

Type

Description

content-type*

String

multipart/form-data

Request Body

Name

Type

Description

job*

Object

The Job Specification JSON Object

audio*

Bytes

An audio file containing the speech for your avatar video.

Returns a UUID for the rendering job to allow the client to check its status in future. It also returns the job status which will currently always be queued.

{
    "id":"314fb342-e7c0-474b-aab2-994d155f2062",
    "status":"queued"
}

Job Specification JSON

Field

Type

Description

Required?

target_rig

String

Target rig, specifies which rig we want and dealer will pick a compatible renderer. For video generation, only "metahumans" is accepted.

Yes

audio_url

String

Either a HTTP, S3 or OBS URL that points to an audio file to be downloaded by the engine.

tts_params

TTS Param Object

If no audio is provided, use this to specify which TTS engine and what parameters will be used. Otherwise default settings for the specified actor will be used instead.

text

String

A transcript of what the character will say. SSML tags can be included enabling the avatar to display emotion. This service does not offer a translation API so this should match the language selected with the language parameter.

Yes

language

String

Language code - see section Language Codes for information on supported codes and format.

Yes

actor

String

Name of the avatar for the video. See section Actors for a list of available avatars.

Yes

camera

Integer

Numeric ID for camera in the scene. See Camera IDs section for available options.

emotion

Object

Emotion object. Controls the emotion expression on the actor's face.

wait_time

Float

Optional timeout in seconds, if the job can't begin within that time it is canceled.

output

Output Object

Set to a video.

web_hooks

Object

Object with keys of the format “on_*” and webhook location URL values. POST requests are made with relevant information. Currently only used for offline/video rendering.

Objects

TTS Object

Field

Type

Description

Required?

engine

String

Name of the engine to use. Currently the only option is "Google".

Yes

speed

Number

Speed of the speech, 1.0 is 100% speed which is the normal speed.

voice

String

This is the name voice to use with Google’s TTS. Please consult their API documentation for speakers for the requested language. Otherwise an appropriate voice will be picked.

Emotion Object

Field

Type

Description

Required?

expression

String

Can be one of: “neutral”, “happy”, “sad”, “surprise”, “fear”, “disappointed”.

level

Float

0.0 for no expression on the actor’s face, 1.0 for maximum expression.

Output Object

Field

Type

Description

Required?

type

String

"csv" for a csv file or “fbx” for an fbx file.

Yes

width

Integer

The horizontal resolution of the video in pixels. 1280 by default.

height

Integer

The vertical resolution in pixels. 720 by default.

background_color

Object

Color object. The background color behind the avatar in videos.

Color Object

Field

Type

Description

Required?

red

Integer

Color channel, ranges from 0-255.

Yes

green

Integer

Color channel, ranges from 0-255.

Yes

blue

Integer

Color channel, ranges from 0-255.

Yes

Webhooks Object

Field

Type

Description

Required?

on_success

string

A request is made to this URL when an offline job result is successfully uploaded to the provided URL.

on_fail

string

A request is made to this URL when an offline job fails.

on_running

string

A request is made to this URL when an offline job leaves the job queue and starts being processed.

See Webhooks section for specifications.

Example JSON Request

{
    "target_rig": "metahumans",
    "text": "what the character is saying",
    "language": "en-US",
    "tts_params": { "engine": "Google"},
    "output": {"type": "video"},
    "camera": 0,
    "actor": "lewis",
    "emotion": {"expression": "happy", "level": 1.0}
}

PreviousAPI Overview NextWebhooks

Last updated 1 year ago