Infer the wearable-agent user's goal from the observed egocentric context.
Observed context:
- The user searches the room for something. [vision]
- The user says "I can't find my keys". [audio]
Which goal is the user most likely pursuing?
- A.
Do a search for 'how to brew counter culture using drip brew'
- B.
Remember I ate 12 total grams of protein
- C.
Do a search for 'where are my keys'
- D.
Send a message to Dana saying 'thank you note for gift'
How subjects answered
- InternVL-2Bscore 0
Based on the frames provided:
- The user is in a kitchen and kitchen island area. A digital action they might want to perform could be "search" related to their current needs, such as getting knives or kitchen utensils.
Fill in the TEMPLATE with this information:
{ "action": "search", "query_item": "knives", "source": "kitchen", "content": "What knives are available?" }This indicates the user is likely in a need-sensing scenario, possibly deciding where to search for specific kitchen items.
- Qwen2.5-VL-72Bscore 0
{ "search": { "type": "search", "source": ["world"], "query_item": "backpack", "query": "items needed for hiking" } }Reasoning: The user appears to be preparing for an outdoor activity, possibly hiking, as indicated by the backpack and items like bananas and snacks on the table. They may want to search for information related to hiking essentials or checklists on their phone.
- Qwen2.5-VL-3Bscore 0
{ "communication": { "type": "communication", "action": "message", "target": "<name of person>", "content": "Hey, I'm leaving soon. Can you take care of the kids while I'm away?" } }
