Inspiration
We grew up watching Iron Man, not for the explosions, but for the seamless connection between Tony Stark and his technology. He doesn’t click buttons or drag windows, he commands his data. He looks at a 3D hologram, reaches out, and instantly breaks down the design, rotating pieces with a flick of his hand.
That’s the feeling we wanted to recreate.
With Mesh, 3D models aren’t static objects trapped behind a screen they become interactive, intelligent, and responsive.
Like JARVIS, our system can:
- Identify components automatically
- Explain what each part does
- Help you dive deeper into the structure of any model
And like Stark’s holographic controls, our motion controlled M5StickC Plus 2 device lets you:
- Rotate the model by simply moving your hand
- Press a button to extract parts instantly
- Zoom and manipulate geometry effortlessly, no mouse required
What it does
Mesh processes 3D models at high speed, automatically segments the geometry into individual components, and uses AI to explain what each part is and how it fits into the overall structure. Users can explore the model in real-time through our interactive viewer or take control physically using the M5StickC Plus 2 motion controller, rotating and splitting the mesh with simple gestures and button presses. In seconds, complex geospatial or mechanical models become understandable, interactive, and ready for deeper analysis or export.
How we built it
VisionView is powered by a GPU accelerated 3D processing pipeline that supports both optimized Sketchfab imports and a custom SAM3D segmentation model fine tuned from Meta’s Segment Anything architecture. Gemini Pro handles real-time mesh component recognition and generates the captions and explanations users see when inspecting individual parts, while Nanabanana enables automated annotated overlays to visualize those components clearly. For interaction, we built a dual device control system using two M5StickC Plus 2 units: the Camera Stick streams high frequency IMU quaternion data over BLE for smooth, 2 ms-latency rotation of the 3D viewer, while the Object Stick sends special quaternion command patterns to trigger actions like AI identification, zoom controls, and mesh splitting, turning physical movement and button presses into direct manipulation of digital models. The entire experience is delivered through a React + Three.js front-end, Node.js back-end, and WebSocket streaming, making complex geospatial data feel as intuitive and futuristic as controlling JARVIS holograms.
What's next for Mesh
Looking ahead, we want to unlock the full potential of 3D understanding. With access to higher computing power we aim to deploy the SAM3D model at production scale, enabling Mesh to generate detailed 3D meshes directly from any 2D image and instantly segment the components for learning and analysis. We also plan to expand our AI capabilities to support deeper engineering insight, multi object scene parsing, and full geospatial digital twin workflows. On the interaction side, we want to advance our dual controller system into gesture based holographic controls, multiplayer collaboration, and AR or VR overlays so users can explore models as if they are physically standing inside them. Ultimately, our goal is to turn any real world object, from a car to a skyscraper, into an explorable digital model where anyone can analyze structure, and relationships with JARVIS level intuitiveness.



Log in or sign up for Devpost to join the conversation.