Chris4K's picture
Update README.md
2113210 verified

A newer version of the Gradio SDK is available: 5.27.1

Upgrade
metadata
title: Named Entity Recognition Tool
emoji: 🌍
colorFrom: purple
colorTo: pink
sdk: gradio
sdk_version: 5.27.0
app_file: app.py
pinned: false
tags:
  - tool

Advanced Named Entity Recognition (NER) Tool for smolagents

This repository contains an enhanced Named Entity Recognition tool built for the smolagents library from Hugging Face. This tool allows you to:

  • Identify named entities (people, organizations, locations, dates, etc.) in text
  • Choose from multiple NER models for different languages and use cases
  • Configure different output formats and confidence thresholds
  • Use with smolagents for AI agents that can understand entities in text

Installation

pip install smolagents transformers torch gradio

For faster inference on GPU:

pip install smolagents transformers torch gradio accelerate

Basic Usage

from ner_tool import NamedEntityRecognitionTool

# Initialize the NER tool
ner_tool = NamedEntityRecognitionTool()

# Analyze text with default settings
result = ner_tool("Apple Inc. is planning to open a new store in Paris, France next year.")
print(result)

# Analyze with custom settings
detailed_result = ner_tool(
    text="Apple Inc. is planning to open a new store in Paris, France next year.",
    model="Babelscape/wikineural-multilingual-ner",  # Different model
    aggregation="detailed",  # More detailed output format
    min_score=0.7  # Lower confidence threshold
)
print(detailed_result)

Available Models

The tool includes several pre-configured models:

Model ID Description
dslim/bert-base-NER Standard NER (English) - Default
jean-baptiste/camembert-ner French NER
Davlan/bert-base-multilingual-cased-ner-hrl Multilingual NER
Babelscape/wikineural-multilingual-ner WikiNeural Multilingual NER
flair/ner-english-ontonotes-large OntoNotes English (fine-grained)
elastic/distilbert-base-cased-finetuned-conll03-english CoNLL (fast)

Output Formats

The tool supports three output formats:

  1. Simple - A simple list of entities found with their types and confidence scores
  2. Grouped - Entities grouped by their category (default)
  3. Detailed - A detailed analysis including the original text with entity markers

Using with an Agent

from smolagents import CodeAgent, InferenceClientModel
from ner_tool import NamedEntityRecognitionTool

# Initialize the NER tool
ner_tool = NamedEntityRecognitionTool()

# Create an agent model
model = InferenceClientModel(
    model_id="mistralai/Mistral-7B-Instruct-v0.2",
    token="your_huggingface_token"
)

# Create the agent with our NER tool
agent = CodeAgent(tools=[ner_tool], model=model)

# Run the agent
result = agent.run(
    "Analyze this text and identify all entities: 'The European Union and United Kingdom finalized a trade deal on Tuesday.'"
)
print(result)

Interactive Gradio Interface

For an interactive experience, run the Gradio app:

python gradio_app.py

This provides a web interface where you can:

  • Enter custom text or select from samples
  • Choose different NER models
  • Configure display formats and confidence thresholds
  • See immediate results

Customization Options

Entity Confidence Score

  • Use min_score parameter to filter entities by confidence
  • Range: 0.0 (include all) to 1.0 (only highest confidence)
  • Default: 0.8

Entity Types

The tool can identify various entity types including:

  • People (PER, PERSON)
  • Organizations (ORG, ORGANIZATION)
  • Locations (LOC, LOCATION, GPE)
  • Dates and Times (DATE, TIME)
  • Money and Percentages (MONEY, PERCENT)
  • Products (PRODUCT)
  • Events (EVENT)
  • Works of Art (WORK_OF_ART)
  • Laws (LAW)
  • Languages (LANGUAGE)
  • Facilities (FAC)
  • Miscellaneous (MISC)

The exact entity types available depend on the chosen model.

Sharing Your Tool

You can share your tool on the Hugging Face Hub:

ner_tool.push_to_hub("your-username/advanced-ner-tool", token="your_huggingface_token")

Limitations

  • First-time model loading may take some time
  • Some models may require significant memory (especially larger ones)
  • Entity recognition accuracy varies by model and language

Contributing

Contributions are welcome! Feel free to open an issue or submit a pull request.

License

MIT