Copilot's new vision feature allows it to interpret and assist with on-screen content across applications, from identifying objects to providing tech support and solving problems.
Takeways• Copilot's 'vision' interprets on-screen content visually.
• Assists with tech support, task guidance, and problem-solving.
• Works across applications and within the Edge browser.
Microsoft's Copilot on the new Surface laptop introduces a 'vision' feature, enabling it to interpret on-screen content by sharing the user's screen. This functionality extends across various applications, including the Edge browser, allowing Copilot to provide context-aware assistance, tech support, and even answer math problems. It works by understanding visual cues and text embedded within images, acting as a personal assistant.
Copilot Vision Capabilities
• 00:00:10 Copilot on the new Surface laptop features a 'vision' option, allowing it to interpret and understand what is displayed on the user's screen. This capability works across multiple applications, including the Edge browser, and can recognize text embedded in images, effectively 'seeing' and reacting to on-screen elements. It can provide context, such as identifying a YouTube homepage or specific video categories, and follow user interactions.
Practical Assistance
• 00:01:21 Copilot provides practical assistance by interpreting user intent based on screen content, acting as a personal tech support. It can guide users through tasks like increasing browser text size, changing laptop wallpaper, or even solving math problems, such as '8 divided by 2 times the sum of 2 plus 2' (answer: 16). Additionally, it can summarize articles, like a World Series Game 3 recap, by processing the visible text.