Vision-Language Query AI
MoonDream 2 Visual Query
Understand Images Using Text Prompts
MoonDream 2 Visual Query is a lightweight open‑source vision-language model. It accepts an image and a natural-language query. The model returns structured text answers; such as object counts, descriptions, or pointing coordinates. It supports captioning, zero‑shot object detection, pointing (x,y locations), and visual question‑answering in a unified interface. MoonDream 2 runs on both CPU and GPU, sized under 2 billion parameters and optimized for edge devices.

Q: Can you describe the image?
A: The image is a painting of the famous Mona Lisa, a renowned artwork by Leonardo da Vinci. The painting features a woman with long hair, wearing a dress, and is set against a scenic landscape. The woman is the central focus of the painting, with her face prominently displayed. The background showcases a beautiful view of a lake and mountains, providing a serene and picturesque setting for the artwork.

Q: How many people are in this image?
A: There are seven people in this image, including a man and a group of young boys.

Q: What might happen next?
A: In the image, a man is holding a pink gift box with a bow, and a woman is covering her eyes with her hands. It is likely that the man is about to present the gift to the woman, and she is anticipating the surprise or excitement that comes with receiving a gift. The man's gesture of holding the gift box and the woman's covering her eyes suggest that this is a special occasion, such as a birthday or an anniversary, and the man is about to share a moment of joy and happiness with the woman.

Q: What art style is this
A: Pop
Frequently Asked Questions
Everything you need to know about our product