AITRE API Documents(English)
  1. Audio Interface(Audio)
AITRE API Documents(English)
  • AITRE Introduction
  • Project Description
  • Send Request
  • Openai's format(Chat)
    • Gpt-4o-image(Generate Image)
      POST
    • Chat Completions Interface
      POST
  • Audio Interface(Audio)
    • TTS(speech-to-text)
      POST
    • Transcriptions
      POST
    • Translations
      POST
  • Embeddings Interface(Embeddings)
    • Embeddings
    • Create Embeddings
      POST
  • Images Generations(Images)
    • Openai's format Gpt-image-1
      POST
    • Openai's format Flux
      POST
    • Openai's format DALL·E 3
      POST
  • Models Interface(Models)
    • Model
      GET
  • Auto Completions Interface(Completions)
    • Completions
      POST
  • MidJourney
    • Description
    • Fetch task by id
      GET
    • Action
      POST
    • Blend(image to image)
      POST
    • Describe (image to text)
      POST
    • Imagine (text to image/image to image)
      POST
    • Modal(Partial redrawing、ZOOM)
      POST
    • Shorten(prompt analysis)
      POST
  1. Audio Interface(Audio)

Transcriptions

POST
{{BASE_URL}}/v1/audio/transcriptions
Transcribe audio into input language.

Request

Header Params
Authorization
string 
required
Provide your bearer token in the Authorization header when making requests to protected resources.
Example:
Authorization: Bearer ********************
Example:
Bearer {{YOUR_API_KEY}}
Body Params multipart/form-data
file
file 
required
The audio file object to be transcribed (not the file name) is in the format of flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm.
model
string 
required
The model ID to be used. Currently, only Whisper-1 is available.
language
string 
optional
Input the language of the audio. Providing input language in ISO-639-1 format can improve accuracy and latency.
prompt
string 
optional
An optional text to guide the style of the model or continue with the previous audio paragraph. The prompt should match the audio language.
response_format
string 
optional
Default is JSON
The format for transcription output can be selected from JSON, Text, SRT, Verbose_JSON, or VTT.
temperature
number 
optional
Default is 0
Sampling temperature, between 0 and 1. A higher value like 0.8 will make the output more random, while a lower value like 0.2 will make it more concentrated and deterministic. If set to 0, the model will automatically increase the temperature using logarithmic probability until a specific threshold is reached.

Request samples

Shell
JavaScript
Java
Swift
Go
PHP
Python
HTTP
C
C#
Objective-C
Ruby
OCaml
Dart
R
Request Request Example
Shell
JavaScript
Java
Swift
curl --location -g --request POST '{{BASE_URL}}/v1/audio/transcriptions' \
--header 'Authorization: Bearer {{YOUR_API_KEY}}' \
--form 'file=@""' \
--form 'model=""' \
--form 'language=""' \
--form 'prompt=""' \
--form 'response_format=""' \
--form 'temperature=""'

Responses

🟢200成功
application/json
Body
text
string 
required
Example
{
  "text": "Imagine the wildest idea that you've ever had, and you're curious about how it might scale to something that's a 100, a 1,000 times bigger. This is a place where you can get to do that."
}
Modified at 2025-03-30 03:28:00
Previous
TTS(speech-to-text)
Next
Translations
Built with