Area(s)
area:gen-ai
What's missing?
As discussed in #2179 (comment), we need to decide how to represent tools that are part of a model API instead of functions called by the client, e.g. the OpenAI code interpreter.
Describe the solution you'd like
In general I think these should be represented similarly to function tool calls for simplicity and consistency. For example, OpenAI returns an output part like this for the code interpreter:
{
'id': 'ci_687fc324daac819ea0e2974e28da08e90031b95cdb2aa2fb',
'code': 'import random\n'
'\n'
'# Generate a random number\n'
'first_random_number = random.randint(1, 100)\n'
'first_random_number',
'container_id': 'cntr_687fc324201c8191a8332730996f1f8301bf4203e99b0b23',
'outputs': [{'logs': '21', 'type': 'logs'}],
'status': 'completed',
'type': 'code_interpreter_call',
},
which I think should be represented by these parts:
{
'type': 'tool_call',
'name': 'code_interpreter_call',
'id': 'ci_687fc324daac819ea0e2974e28da08e90031b95cdb2aa2fb',
'arguments': {
'code': 'import random\n\n# Generate a random number\nfirst_random_number = random.randint(1, 100)\nfirst_random_number',
'container_id': 'cntr_687fc324201c8191a8332730996f1f8301bf4203e99b0b23',
},
},
{
'type': 'tool_call_response',
'id': 'ci_687fc324daac819ea0e2974e28da08e90031b95cdb2aa2fb',
'response': {
'outputs': [{'logs': '21', 'type': 'logs'}],
'status': 'completed',
},
},
Optionally there could be an extra boolean to indicate that this is a server-side call.
In Gemini, there's already two parts in the output:
{
"executableCode": {
"language": "PYTHON",
"code": "import random\n\nrandom_number = random.randint(1, 100)\nprint(f\"The random number generated is: {random_number}\")\n"
}
},
{
"codeExecutionResult": {
"outcome": "OUTCOME_OK",
"output": "The random number generated is: 6\n"
}
},
Note that there's nothing to use here as a tool call ID, so sticking with that format may mean making that field optional, or generating a random ID.
Alternatively, maybe we should have semantic conventions for common tools like code interpreters. It would be nice for 'outputs': [{'logs': '21', 'type': 'logs'}], and "output": "21" to be unified.
Area(s)
area:gen-ai
What's missing?
As discussed in #2179 (comment), we need to decide how to represent tools that are part of a model API instead of functions called by the client, e.g. the OpenAI code interpreter.
Describe the solution you'd like
In general I think these should be represented similarly to function tool calls for simplicity and consistency. For example, OpenAI returns an output part like this for the code interpreter:
{ 'id': 'ci_687fc324daac819ea0e2974e28da08e90031b95cdb2aa2fb', 'code': 'import random\n' '\n' '# Generate a random number\n' 'first_random_number = random.randint(1, 100)\n' 'first_random_number', 'container_id': 'cntr_687fc324201c8191a8332730996f1f8301bf4203e99b0b23', 'outputs': [{'logs': '21', 'type': 'logs'}], 'status': 'completed', 'type': 'code_interpreter_call', },which I think should be represented by these parts:
{ 'type': 'tool_call', 'name': 'code_interpreter_call', 'id': 'ci_687fc324daac819ea0e2974e28da08e90031b95cdb2aa2fb', 'arguments': { 'code': 'import random\n\n# Generate a random number\nfirst_random_number = random.randint(1, 100)\nfirst_random_number', 'container_id': 'cntr_687fc324201c8191a8332730996f1f8301bf4203e99b0b23', }, }, { 'type': 'tool_call_response', 'id': 'ci_687fc324daac819ea0e2974e28da08e90031b95cdb2aa2fb', 'response': { 'outputs': [{'logs': '21', 'type': 'logs'}], 'status': 'completed', }, },Optionally there could be an extra boolean to indicate that this is a server-side call.
In Gemini, there's already two parts in the output:
{ "executableCode": { "language": "PYTHON", "code": "import random\n\nrandom_number = random.randint(1, 100)\nprint(f\"The random number generated is: {random_number}\")\n" } }, { "codeExecutionResult": { "outcome": "OUTCOME_OK", "output": "The random number generated is: 6\n" } },Note that there's nothing to use here as a tool call ID, so sticking with that format may mean making that field optional, or generating a random ID.
Alternatively, maybe we should have semantic conventions for common tools like code interpreters. It would be nice for
'outputs': [{'logs': '21', 'type': 'logs'}],and"output": "21"to be unified.