moshew's picture
Add new SentenceTransformer model
dba50f6 verified
metadata
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:6000
  - loss:CoSENTLoss
base_model: avsolatorio/GIST-small-Embedding-v0
widget:
  - source_sentence: are paris metro tickets one way?
    sentences:
      - >-
        The two big differences between the 2.4 GHz and 5 GHz frequencies are
        speed and range. A wireless transmission at 2.4 GHz provides internet to
        a larger area but sacrifices speed, while 5 GHz provides faster speeds
        to a smaller area.
      - >-
        The State of Rhode Island has adopted the income shares model to
        determine the weekly child support order. It is based upon the
        philosophy that children are entitled to the standard of living based
        upon both parents monthly income. ... Weekly gross income of both
        parents before taxes and before any other deductions.
      - >-
        Insulin NPH may be administered in 2 divided doses daily (either as
        equally divided doses, or as ~2/3 of the dose before the morning meal
        and ~1/3 of the dose before the evening meal or at bedtime).
  - source_sentence: how to pxe boot surface pro?
    sentences:
      - >-
        The UKTV Play app, with shows from Dave, Drama, Yesterday and Really, is
        available on smart TVs powered by Freeview Play and newer Samsung TVs.
        ... You can watch catch up and box sets from W, Alibi, Gold, Eden, Dave,
        Drama and Yesterday on Sky+HD, Sky Q and Sky Go.
      - >-
        In a branch For cash that was deposited over the counter at another
        bank, the processing and clearance time is 5 business days (not
        including public holidays).
      - >-
        ['Click "account" in the upper right corner of your Facebook page.',
        'Select "privacy settings."', 'Under "block lists" at the bottom center
        of the page, click "edit your lists."', 'At the top, under "block
        users," add the name or e-mail address of the person you\'d like to
        block.', 'Click "block."']
  - source_sentence: what is long-term capital gains rate?
    sentences:
      - >-
        You can get Social Security retirement or survivors benefits and work at
        the same time. But, if you're younger than full retirement age, and earn
        more than certain amounts, your benefits will be reduced. The amount
        that your benefits are reduced, however, isn't truly lost.
      - >-
        Dreams that involve shouting can warn of impending trouble. When you are
        the one shouting, this can mean you are going through a tough time in
        your waking life. You may be only feeling only negative emotions. ...
        Hearing someone else shouting signifies a warning of fright or anger.
      - >-
        A regular polygon is a flat shape whose sides are all equal and whose
        angles are all equal. The formula for finding the sum of the measure of
        the interior angles is (n - 2) * 180. To find the measure of one
        interior angle, we take that formula and divide by the number of sides
        n: (n - 2) * 180 / n.
  - source_sentence: can a girl get pregnant two days after her menstruation?
    sentences:
      - >-
        Newborn usually refers to a baby from birth to about 2 months of age.
        Infants can be considered children anywhere from birth to 1 year old.
        Baby can be used to refer to any child from birth to age 4 years old,
        thus encompassing newborns, infants, and toddlers.
      - >-
        According to professional numerologists, there are three ultimately
        lucky numbers for Capricorn-born people: they are 5, 6, and 8. In case
        they want to increase the chance of success for anything, simply make
        use of these numbers.
      - >-
        He's a professional dancer and model. J.C. Before entering the Big
        Brother house, J.C. was a dancer who traveled the world to perform
        professionally. “I do professional dancing. Not really break dancing, I
        do more choreography dancing,” he said in an interview with
        Entertainment Tonight Canada.
  - source_sentence: how long does it take to transfer money between anz and westpac?
    sentences:
      - >-
        This service is currently offered free of charge by the bank. You can
        get the last 'Available' balance of your account (by an SMS) by giving a
        Missed Call to 18008431122. You can get the Mini Statement (by an SMS)
        for last 5 transactions in your account by giving a Missed Call to
        18008431133. 1.
      - >-
        Simply put, 1 ply toilet paper is made of a single layer of paper, while
        2 ply has two layers. ... In the 1950's, a manufacturer created a method
        to roll and attach one-ply paper together to make a thicker “two-ply”.
        For years, 2-ply toilet tissue was always thicker and usually assumed to
        be better.
      - >-
        The main difference between unique and distinct is that UNIQUE is a
        constraint that is used on the input of data and ensures data integrity.
        While DISTINCT keyword is used when we want to query our results or in
        other words, output the data.
pipeline_tag: sentence-similarity
library_name: sentence-transformers

SentenceTransformer based on avsolatorio/GIST-small-Embedding-v0

This is a sentence-transformers model finetuned from avsolatorio/GIST-small-Embedding-v0. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: avsolatorio/GIST-small-Embedding-v0
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 384 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("moshew/gist_small_ft_gooaq_v1")
# Run inference
sentences = [
    'how long does it take to transfer money between anz and westpac?',
    "This service is currently offered free of charge by the bank. You can get the last 'Available' balance of your account (by an SMS) by giving a Missed Call to 18008431122. You can get the Mini Statement (by an SMS) for last 5 transactions in your account by giving a Missed Call to 18008431133. 1.",
    "Simply put, 1 ply toilet paper is made of a single layer of paper, while 2 ply has two layers. ... In the 1950's, a manufacturer created a method to roll and attach one-ply paper together to make a thicker “two-ply”. For years, 2-ply toilet tissue was always thicker and usually assumed to be better.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 6,000 training samples
  • Columns: sentence1, sentence2, and label
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 label
    type string string float
    details
    • min: 8 tokens
    • mean: 11.97 tokens
    • max: 23 tokens
    • min: 14 tokens
    • mean: 58.86 tokens
    • max: 126 tokens
    • min: 0.0
    • mean: 0.17
    • max: 1.0
  • Samples:
    sentence1 sentence2 label
    what is the difference between rapid rise yeast and bread machine yeast? Though there are some minor differences in shape and nutrients, Rapid-Rise Yeast is (pretty much) the same as Instant Yeast and Bread Machine Yeast. ... Also, Rapid-Rise Yeast is a little more potent than Active Dry Yeast and can be mixed in with your dry ingredients directly. 1.0
    what is the difference between rapid rise yeast and bread machine yeast? Omeprazole and esomeprazole therapy are both associated with a low rate of transient and asymptomatic serum aminotransferase elevations and are rare causes of clinically apparent liver injury. 0.0
    what is the difference between rapid rise yeast and bread machine yeast? Benefits of choosing a soft starter A variable frequency drive (VFD) is a motor control device that protects and controls the speed of an AC induction motor. A VFD can control the speed of the motor during the start and stop cycle, as well as throughout the run cycle. 0.0
  • Loss: CoSENTLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "pairwise_cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • seed: 12
  • bf16: True
  • dataloader_num_workers: 4

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 12
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 4
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • tp_size: 0
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss
0.0027 1 0.3104

Framework Versions

  • Python: 3.11.12
  • Sentence Transformers: 4.1.0
  • Transformers: 4.51.3
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.5.2
  • Datasets: 3.5.0
  • Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

CoSENTLoss

@online{kexuefm-8847,
    title={CoSENT: A more efficient sentence vector scheme than Sentence-BERT},
    author={Su Jianlin},
    year={2022},
    month={Jan},
    url={https://kexue.fm/archives/8847},
}