Spaces:
Runtime error
Runtime error
Update README.md
Browse files
README.md
CHANGED
@@ -10,12 +10,12 @@ pinned: false
|
|
10 |
license: openrail
|
11 |
---
|
12 |
|
13 |
-
#
|
14 |
|
15 |
- a quick overview of the inner workings of LandingAI's Vision Agent, how it breaks down an initial user requirement to identify candidate components in the application architecture.
|
16 |
-
- the diagram below captures what I had in mind for a multi-agent system but LandingAI's vision agent starts this much earlier, taking a fresh approach on old school architecture trade-off analysis.
|
17 |
-
-
|
18 |
-
- the flow
|
19 |
|
20 |

|
21 |
|
@@ -271,6 +271,8 @@ overlay_segmentation_masks(image: numpy.ndarray, masks: List[Dict[str, Any]]) ->
|
|
271 |
|
272 |
## Vision Agent Tools - model summary
|
273 |
|
|
|
|
|
274 |
| Model Name | Hugging Face Model | Primary Function | Use Cases |
|
275 |
|---------------------|-------------------------------------|-------------------------------|--------------------------------------------------------------|
|
276 |
| OWL-ViT v2 | google/owlv2-base-patch16-ensemble | Object detection and localization | - Open-world object detection<br>- Locating specific objects based on text prompts |
|
@@ -279,7 +281,7 @@ overlay_segmentation_masks(image: numpy.ndarray, masks: List[Dict[str, Any]]) ->
|
|
279 |
| CLIP | openai/clip-vit-base-patch32 | Image-text similarity | - Zero-shot image classification<br>- Image-text matching |
|
280 |
| BLIP | Salesforce/blip-image-captioning-base | Image captioning | - Generating text descriptions of images |
|
281 |
| LOCA | Custom implementation | Object counting | - Zero-shot object counting<br>- Object counting with visual prompts |
|
282 |
-
| GIT v2 | microsoft/git-base-
|
283 |
| Grounding DINO | groundingdino/groundingdino-swint-ogc | Object detection and localization | - Detecting objects based on text prompts |
|
284 |
| SAM | facebook/sam-vit-huge | Instance segmentation | - Text-prompted instance segmentation |
|
285 |
| DETR | facebook/detr-resnet-50 | Object detection | - General object detection |
|
|
|
10 |
license: openrail
|
11 |
---
|
12 |
|
13 |
+
# Using Landing AI's Vision Agent to architect an app for brain tumor detection
|
14 |
|
15 |
- a quick overview of the inner workings of LandingAI's Vision Agent, how it breaks down an initial user requirement to identify candidate components in the application architecture.
|
16 |
+
- the diagram below captures what I had in mind for a multi-agent system implementation -- but LandingAI's vision agent starts this much earlier, taking a fresh approach on old school architecture trade-off analysis.
|
17 |
+
- the design-time flow in the most recent version of the official [Vision Agent](https://va.landing.ai/agent) app has shifted somewhat, but the number of concepts it helped bring together for me was amazing.
|
18 |
+
- if you want a deeper understanding of the run-time flow of the application I encourage you to instrument it with Weave. Additional information on how to instrument the app can be found in [this GitHub repo](https://github.com/donbr/vision-agent).
|
19 |
|
20 |

|
21 |
|
|
|
271 |
|
272 |
## Vision Agent Tools - model summary
|
273 |
|
274 |
+
- any mistakes in the following table are mine. my efforts to do some QUICK reverse engineering to identify target models.
|
275 |
+
|
276 |
| Model Name | Hugging Face Model | Primary Function | Use Cases |
|
277 |
|---------------------|-------------------------------------|-------------------------------|--------------------------------------------------------------|
|
278 |
| OWL-ViT v2 | google/owlv2-base-patch16-ensemble | Object detection and localization | - Open-world object detection<br>- Locating specific objects based on text prompts |
|
|
|
281 |
| CLIP | openai/clip-vit-base-patch32 | Image-text similarity | - Zero-shot image classification<br>- Image-text matching |
|
282 |
| BLIP | Salesforce/blip-image-captioning-base | Image captioning | - Generating text descriptions of images |
|
283 |
| LOCA | Custom implementation | Object counting | - Zero-shot object counting<br>- Object counting with visual prompts |
|
284 |
+
| GIT v2 | microsoft/git-base-vqav2 | Visual question answering and image captioning | - Answering questions about image content<br>- Generating text descriptions of images |
|
285 |
| Grounding DINO | groundingdino/groundingdino-swint-ogc | Object detection and localization | - Detecting objects based on text prompts |
|
286 |
| SAM | facebook/sam-vit-huge | Instance segmentation | - Text-prompted instance segmentation |
|
287 |
| DETR | facebook/detr-resnet-50 | Object detection | - General object detection |
|