SentenceTransformer based on answerdotai/ModernBERT-large

This is a sentence-transformers model finetuned from answerdotai/ModernBERT-large. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: answerdotai/ModernBERT-large
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 1024 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: ModernBertModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'what is a tensilon universal testing instrument',
    "Universal Material Testing Instrument. The TENSILON RTF is our newest universal testing machine offering innovative measuring possibilities, based on A&D's newly-developed and extensive technological knowledge.The RTF Series is a world-class Class 0.5 testing machine.Having improved the overall design and structure of the machine, we achieved a very strong load frame stiffness enabling super-high accuracy in measurement.he RTF Series is a world-class Class 0.5 testing machine. Having improved the overall design and structure of the machine, we achieved a very strong load frame stiffness enabling super-high accuracy in measurement.",
    "The McDonald Patent Universal String Tension Calculator (MPUSTC) is a handy calculator to figure string tensions in steel-string instruments. If you plug in your scale length, string gauges and tuning, it will give you a readout of the tension on each of the strings. This is useful when you're trying to fine-tune a set of custom gauges, or when you're working out how far you can push a drop tuning before it becomes unmanageable.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 499,184 training samples
  • Columns: sentence_0, sentence_1, and sentence_2
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1 sentence_2
    type string string string
    details
    • min: 4 tokens
    • mean: 9.07 tokens
    • max: 21 tokens
    • min: 17 tokens
    • mean: 80.89 tokens
    • max: 254 tokens
    • min: 20 tokens
    • mean: 79.05 tokens
    • max: 226 tokens
  • Samples:
    sentence_0 sentence_1 sentence_2
    what is a dependent person 1. depending on a person or thing for aid, support, life, etc. 2. (postpositive; foll by on or upon) influenced or conditioned (by); contingent (on) 3. subordinate; subject: a dependent prince. 4. obsolete hanging down. Dependent personality disorder (DPD) is one of the most frequently diagnosed personality disorders. It occurs equally in men and women, usually becoming apparent in young adulthood or later as important adult relationships form. People with DPD become emotionally dependent on other people and spend great effort trying to please others. People with DPD tend to display needy, passive, and clinging behavior, and have a fear of separation. Other common characteristics of this personality disorder include:
    what is the hat trick in hockey Definition of hat trick. 1 1 : the retiring of three batsmen with three consecutive balls by a bowler in cricket. 2 2 : the scoring of three goals in one game (as of hockey or soccer) by a single player. 3 3 : a series of three victories, successes, or related accomplishments scored a hat trick when her three best steers corralled top honors — People. Hat trick was first recorded in print in the 1870s, but has since been widened to apply to any sport in which the person competing carries off some feat three times in quick succession, such as scoring three goals in one game of soccer.
    what is an egalitarian An egalitarian is defined as a person who believes all people were created equal and should be treated equal. An example of an egalitarian is a person who fights for civil rights, like Martin Luther King Jr. About Egalitarian Companies. In the tradition hierarchical corporate structure, each employee operates under a specific job description. Each employee also reports to a superior who monitors his progress and issues instructions. Egalitarian-style companies eliminate most of this structure. Employees in an egalitarian company have general job descriptions, rather than specific ones. Instead of reporting to a superior, all employees in an egalitarian company work collaboratively on tasks and behave as equals.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • num_train_epochs: 10
  • fp16: True
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 10
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • tp_size: 0
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin

Training Logs

Click to expand
Epoch Step Training Loss
0.0321 500 1.1178
0.0641 1000 0.293
0.0962 1500 0.2542
0.1282 2000 0.2357
0.1603 2500 0.2187
0.1923 3000 0.2107
0.2244 3500 0.1959
0.2564 4000 0.2049
0.2885 4500 0.1945
0.3205 5000 0.1848
0.3526 5500 0.1846
0.3846 6000 0.1736
0.4167 6500 0.1795
0.4487 7000 0.1767
0.4808 7500 0.1727
0.5128 8000 0.1688
0.5449 8500 0.1708
0.5769 9000 0.1663
0.6090 9500 0.1654
0.6410 10000 0.1637
0.6731 10500 0.1651
0.7051 11000 0.1625
0.7372 11500 0.1584
0.7692 12000 0.1607
0.8013 12500 0.156
0.8333 13000 0.1548
0.8654 13500 0.1484
0.8974 14000 0.1527
0.9295 14500 0.1555
0.9615 15000 0.1528
0.9936 15500 0.1533
1.0256 16000 0.0827
1.0577 16500 0.0597
1.0897 17000 0.0599
1.1218 17500 0.0592
1.1538 18000 0.0592
1.1859 18500 0.0584
1.2179 19000 0.0615
1.25 19500 0.0589
1.2821 20000 0.0612
1.3141 20500 0.0618
1.3462 21000 0.0606
1.3782 21500 0.0587
1.4103 22000 0.0611
1.4423 22500 0.0616
1.4744 23000 0.0623
1.5064 23500 0.0615
1.5385 24000 0.0602
1.5705 24500 0.0658
1.6026 25000 0.068
1.6346 25500 0.0649
1.6667 26000 0.0645
1.6987 26500 0.0652
1.7308 27000 0.0632
1.7628 27500 0.0631
1.7949 28000 0.0655
1.8269 28500 0.0633
1.8590 29000 0.0607
1.8910 29500 0.0633
1.9231 30000 0.0612
1.9551 30500 0.0631
1.9872 31000 0.0616
2.0192 31500 0.0382
2.0513 32000 0.0178
2.0833 32500 0.0177
2.1154 33000 0.0178
2.1474 33500 0.0171
2.1795 34000 0.0188
2.2115 34500 0.0186
2.2436 35000 0.0177
2.2756 35500 0.0183
2.3077 36000 0.0195
2.3397 36500 0.0202
2.3718 37000 0.0199
2.4038 37500 0.0197
2.4359 38000 0.019
2.4679 38500 0.021
2.5 39000 0.0195
2.5321 39500 0.0211
2.5641 40000 0.0205
2.5962 40500 0.0207
2.6282 41000 0.0222
2.6603 41500 0.0204
2.6923 42000 0.0205
2.7244 42500 0.0211
2.7564 43000 0.0232
2.7885 43500 0.0202
2.8205 44000 0.0207
2.8526 44500 0.0225
2.8846 45000 0.0224
2.9167 45500 0.0203
2.9487 46000 0.0215
2.9808 46500 0.0218
3.0128 47000 0.0159
3.0449 47500 0.0064
3.0769 48000 0.0069
3.1090 48500 0.0074
3.1410 49000 0.0075
3.1731 49500 0.0066
3.2051 50000 0.0076
3.2372 50500 0.0073
3.2692 51000 0.0077
3.3013 51500 0.0075
3.3333 52000 0.0079
3.3654 52500 0.008
3.3974 53000 0.0087
3.4295 53500 0.0077
3.4615 54000 0.0084
3.4936 54500 0.0086
3.5256 55000 0.009
3.5577 55500 0.0082
3.5897 56000 0.0084
3.6218 56500 0.0084
3.6538 57000 0.008
3.6859 57500 0.0079
3.7179 58000 0.0085
3.75 58500 0.0096
3.7821 59000 0.0087
3.8141 59500 0.0086
3.8462 60000 0.0089
3.8782 60500 0.0081
3.9103 61000 0.0087
3.9423 61500 0.0085
3.9744 62000 0.0082
4.0064 62500 0.0076
4.0385 63000 0.0037
4.0705 63500 0.0035
4.1026 64000 0.0037
4.1346 64500 0.004
4.1667 65000 0.0037
4.1987 65500 0.0036
4.2308 66000 0.0042
4.2628 66500 0.0044
4.2949 67000 0.0041
4.3269 67500 0.004
4.3590 68000 0.0037
4.3910 68500 0.0043
4.4231 69000 0.0035
4.4551 69500 0.0045
4.4872 70000 0.0042
4.5192 70500 0.0043
4.5513 71000 0.0042
4.5833 71500 0.0049
4.6154 72000 0.0041
4.6474 72500 0.0041
4.6795 73000 0.0044
4.7115 73500 0.0038
4.7436 74000 0.0039
4.7756 74500 0.0049
4.8077 75000 0.0041
4.8397 75500 0.0044
4.8718 76000 0.0043
4.9038 76500 0.0053
4.9359 77000 0.0043
4.9679 77500 0.0049
5.0 78000 0.0042
5.0321 78500 0.0022
5.0641 79000 0.0023
5.0962 79500 0.0021
5.1282 80000 0.003
5.1603 80500 0.0024
5.1923 81000 0.0022
5.2244 81500 0.0023
5.2564 82000 0.0022
5.2885 82500 0.0027
5.3205 83000 0.0023
5.3526 83500 0.0029
5.3846 84000 0.0027
5.4167 84500 0.0025
5.4487 85000 0.0029
5.4808 85500 0.0029
5.5128 86000 0.0024
5.5449 86500 0.0026
5.5769 87000 0.0026
5.6090 87500 0.0028
5.6410 88000 0.0025
5.6731 88500 0.0026
5.7051 89000 0.0023
5.7372 89500 0.0029
5.7692 90000 0.0027
5.8013 90500 0.0019
5.8333 91000 0.0023
5.8654 91500 0.0022
5.8974 92000 0.003
5.9295 92500 0.0023
5.9615 93000 0.0026
5.9936 93500 0.0027
6.0256 94000 0.0015
6.0577 94500 0.0012
6.0897 95000 0.0016
6.1218 95500 0.0018
6.1538 96000 0.0017
6.1859 96500 0.0014
6.2179 97000 0.0013
6.25 97500 0.0022
6.2821 98000 0.0015
6.3141 98500 0.002
6.3462 99000 0.0021
6.3782 99500 0.0016
6.4103 100000 0.0024
6.4423 100500 0.002
6.4744 101000 0.0014
6.5064 101500 0.0019
6.5385 102000 0.0017
6.5705 102500 0.0019
6.6026 103000 0.0016
6.6346 103500 0.0013
6.6667 104000 0.0012
6.6987 104500 0.0015
6.7308 105000 0.0015
6.7628 105500 0.0018
6.7949 106000 0.0018
6.8269 106500 0.0016
6.8590 107000 0.0018
6.8910 107500 0.0026
6.9231 108000 0.0013
6.9551 108500 0.0019
6.9872 109000 0.0015
7.0192 109500 0.0014
7.0513 110000 0.0009
7.0833 110500 0.0012
7.1154 111000 0.0016
7.1474 111500 0.0014
7.1795 112000 0.0013
7.2115 112500 0.0009
7.2436 113000 0.0015
7.2756 113500 0.0011
7.3077 114000 0.0011
7.3397 114500 0.0011
7.3718 115000 0.0013
7.4038 115500 0.001
7.4359 116000 0.0012
7.4679 116500 0.0012
7.5 117000 0.0013
7.5321 117500 0.0014
7.5641 118000 0.0013
7.5962 118500 0.0013
7.6282 119000 0.0014
7.6603 119500 0.001
7.6923 120000 0.0012
7.7244 120500 0.0018
7.7564 121000 0.001
7.7885 121500 0.0014
7.8205 122000 0.0011
7.8526 122500 0.0012
7.8846 123000 0.0012
7.9167 123500 0.0008
7.9487 124000 0.0013
7.9808 124500 0.0014
8.0128 125000 0.001
8.0449 125500 0.0007
8.0769 126000 0.001
8.1090 126500 0.0009
8.1410 127000 0.0007
8.1731 127500 0.0007
8.2051 128000 0.001
8.2372 128500 0.0011
8.2692 129000 0.0008
8.3013 129500 0.0007
8.3333 130000 0.0013
8.3654 130500 0.0012
8.3974 131000 0.001
8.4295 131500 0.001
8.4615 132000 0.0007
8.4936 132500 0.001
8.5256 133000 0.001
8.5577 133500 0.001
8.5897 134000 0.0011
8.6218 134500 0.0013
8.6538 135000 0.0007
8.6859 135500 0.001
8.7179 136000 0.0008
8.75 136500 0.001
8.7821 137000 0.0008
8.8141 137500 0.0006
8.8462 138000 0.0006
8.8782 138500 0.0009
8.9103 139000 0.0007
8.9423 139500 0.0009
8.9744 140000 0.0006
9.0064 140500 0.0018
9.0385 141000 0.0008
9.0705 141500 0.0008
9.1026 142000 0.0009
9.1346 142500 0.0006
9.1667 143000 0.0009
9.1987 143500 0.0007
9.2308 144000 0.0007
9.2628 144500 0.0006
9.2949 145000 0.0008
9.3269 145500 0.0009
9.3590 146000 0.0005
9.3910 146500 0.001
9.4231 147000 0.001
9.4551 147500 0.0011
9.4872 148000 0.0011
9.5192 148500 0.0012
9.5513 149000 0.0011
9.5833 149500 0.0007
9.6154 150000 0.0008
9.6474 150500 0.0005
9.6795 151000 0.0007
9.7115 151500 0.0008
9.7436 152000 0.0007
9.7756 152500 0.0009
9.8077 153000 0.0007
9.8397 153500 0.0012
9.8718 154000 0.0005
9.9038 154500 0.0008
9.9359 155000 0.0007
9.9679 155500 0.0007
10.0 156000 0.0011

Framework Versions

  • Python: 3.11.11
  • Sentence Transformers: 3.4.1
  • Transformers: 4.50.3
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.5.2
  • Datasets: 3.5.0
  • Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
7
Safetensors
Model size
395M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for BlackBeenie/ModernBERT-large-biencoder-msmarco

Finetuned
(101)
this model