omniscience

Runtime error

App Files Files Community

dwb2023 commited on Aug 8, 2024

Commit

7d59523

verified ·

1 Parent(s): 97ec49f

Update README.md

Browse files

Files changed (1) hide show

README.md +7 -5

README.md CHANGED Viewed

@@ -10,12 +10,12 @@ pinned: false
 license: openrail
 ---
-# Use of Landing AI for brain tumor detection
 - a quick overview of the inner workings of LandingAI's Vision Agent, how it breaks down an initial user requirement to identify candidate components in the application architecture.
-- the diagram below captures what I had in mind for a multi-agent system but LandingAI's vision agent starts this much earlier, taking a fresh approach on old school architecture trade-off analysis.
-- if you want a deeper understanding of the run-time flow of the application I encourage you to instrument it with Weave.  Additional information in [this GitHub repo](https://github.com/donbr/vision-agent).
-- the flow in the most recent version of the official [Vision Agent](https://va.landing.ai/agent) app has shifted somewhat, but the number of concepts it helped bring together for me was amazing.
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/653d62fab16f657d28ce2cf2/KPV1Szj6IkY457n3Hqjl6.png)
@@ -271,6 +271,8 @@ overlay_segmentation_masks(image: numpy.ndarray, masks: List[Dict[str, Any]]) ->
 ## Vision Agent Tools - model summary
 | Model Name          | Hugging Face Model                  | Primary Function               | Use Cases                                                    |
 |---------------------|-------------------------------------|-------------------------------|--------------------------------------------------------------|
 | OWL-ViT v2          | google/owlv2-base-patch16-ensemble  | Object detection and localization | - Open-world object detection<br>- Locating specific objects based on text prompts |
@@ -279,7 +281,7 @@ overlay_segmentation_masks(image: numpy.ndarray, masks: List[Dict[str, Any]]) ->
 | CLIP                | openai/clip-vit-base-patch32        | Image-text similarity           | - Zero-shot image classification<br>- Image-text matching    |
 | BLIP                | Salesforce/blip-image-captioning-base | Image captioning                | - Generating text descriptions of images                    |
 | LOCA                | Custom implementation               | Object counting                 | - Zero-shot object counting<br>- Object counting with visual prompts |
-| GIT v2              | microsoft/git-base-textcaps         | Visual question answering and image captioning | - Answering questions about image content<br>- Generating text descriptions of images |
 | Grounding DINO      | groundingdino/groundingdino-swint-ogc | Object detection and localization | - Detecting objects based on text prompts                   |
 | SAM                 | facebook/sam-vit-huge               | Instance segmentation           | - Text-prompted instance segmentation                       |
 | DETR                | facebook/detr-resnet-50             | Object detection                | - General object detection                                  |

 license: openrail
 ---
+# Using Landing AI's Vision Agent to architect an app for brain tumor detection
 - a quick overview of the inner workings of LandingAI's Vision Agent, how it breaks down an initial user requirement to identify candidate components in the application architecture.
+- the diagram below captures what I had in mind for a multi-agent system implementation -- but LandingAI's vision agent starts this much earlier, taking a fresh approach on old school architecture trade-off analysis.
+- the design-time flow in the most recent version of the official [Vision Agent](https://va.landing.ai/agent) app has shifted somewhat, but the number of concepts it helped bring together for me was amazing.
+- if you want a deeper understanding of the run-time flow of the application I encourage you to instrument it with Weave.  Additional information on how to instrument the app can be found in [this GitHub repo](https://github.com/donbr/vision-agent).
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/653d62fab16f657d28ce2cf2/KPV1Szj6IkY457n3Hqjl6.png)
 ## Vision Agent Tools - model summary
+- any mistakes in the following table are mine.  my efforts to do some QUICK reverse engineering to identify target models.
 | Model Name          | Hugging Face Model                  | Primary Function               | Use Cases                                                    |
 |---------------------|-------------------------------------|-------------------------------|--------------------------------------------------------------|
 | OWL-ViT v2          | google/owlv2-base-patch16-ensemble  | Object detection and localization | - Open-world object detection<br>- Locating specific objects based on text prompts |
 | CLIP                | openai/clip-vit-base-patch32        | Image-text similarity           | - Zero-shot image classification<br>- Image-text matching    |
 | BLIP                | Salesforce/blip-image-captioning-base | Image captioning                | - Generating text descriptions of images                    |
 | LOCA                | Custom implementation               | Object counting                 | - Zero-shot object counting<br>- Object counting with visual prompts |
+| GIT v2              | microsoft/git-base-vqav2            | Visual question answering and image captioning | - Answering questions about image content<br>- Generating text descriptions of images |
 | Grounding DINO      | groundingdino/groundingdino-swint-ogc | Object detection and localization | - Detecting objects based on text prompts                   |
 | SAM                 | facebook/sam-vit-huge               | Instance segmentation           | - Text-prompted instance segmentation                       |
 | DETR                | facebook/detr-resnet-50             | Object detection                | - General object detection                                  |