|
--- |
|
license: mit |
|
language: |
|
- en |
|
metrics: |
|
- accuracy |
|
pipeline_tag: text-generation |
|
widget: |
|
- text: "<schema>CREATE TABLE radio(age VARCHAR, radio_id VARCHAR, frequency VARCHAR, wavelength VARCHAR); CREATE TABLE radio_faults(radio_id VARCHAR, fault_description VARCHAR)</schema><question>Get the radio id and defect descriptions of radios that have wavelength greater than 30 ?</question><sql>" |
|
example_title: "example1" |
|
- text: "<schema>CREATE TABLE system(JobID: String,GID: String, UID: String, Start:Time(yyyy/mm/dd), End: Time,ElapsedRaw: Time, CPUTimeRAW: Time,NCPUS: Number,NNodes: Number, NodeList: List, State:String, Timelimit: Time);</schema><question>Get UID and job id for Jobs that started on Jan 20 , 2023</question><sql>" |
|
example_title: "example2" |
|
- text: "<schema>CREATE TABLE department (Department_ID number, Name text, Creation text, Ranking number, Budget_in_Billions number, Num_Employees number) which has Department_ID as primary key abd CREATE TABLE head (head_ID number, name text, born_state text, age number) which has head_ID as primary key and CREATE TABLE management (department_ID number, head_ID number, temporary_acting text) which has department_ID as primary key</schema><question>" |
|
example_title: "example3" |
|
tags: |
|
- code |
|
- sql |
|
- text2sql |
|
- instruction_tuned |
|
- jax |
|
- pytorch |
|
- 1b |
|
- expert |
|
datasets: |
|
- PipableAI/spider-bird |
|
--- |
|
# Pipable’s pipSQL |
|
Please refer to https://huggingface.co./PipableAI/pipSQL-1.3b for our state of the art model, that gives better performance than chatgpt and claude on sql tasks on a lot of benchmarks. |
|
|
|
|
|
Pipable’s pipSQL is a model distilled from llama 1b to generate sql queries given prompt and schema. |
|
We used a unique pipeline which involved the model working on two objectives alternatively ---- |
|
1. Maximizing the log prob of all tokens in the sequence (including the prompt tokens) |
|
2. Minimizng the difference between the true value and the predicted maximum value of the output tokens i.e generated tokens for the sql query slice of the entire sequence. |
|
|
|
|
|
|
|
|
|
|
|
## License |
|
|
|
The model's new weights along with all other assets involved with it are open sourced under mit license. |
|
|
|
## How to Use |
|
|
|
```python |
|
text = """<schema>{schema}</schema> |
|
<question>{question}</question> |
|
<sql>""" |
|
``` |
|
pytorch |
|
|
|
```python |
|
from transformers import AutoModelForCasualLM, AutoTokenizer |
|
device = "cuda" |
|
model = AutoModelForCausalLM.from_pretrained("PipableAI/pipSQL1b") |
|
tokenizer = AutoTokenizer.from_pretrained("PipableAI/pipSQL1b") |
|
|
|
inputs = tokenizer(text, return_tensors="pt") |
|
outputs = model.generate(**inputs, max_new_tokens=200) |
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True).split('<sql>')[1].split('</sql>')[0]) |
|
``` |
|
flax |
|
|
|
```python |
|
from transformers import FlaxAutoModelForCasualLM, AutoTokenizer |
|
model = FlaxAutoModelForCausalLM.from_pretrained("PipableAI/pipSQL1b" , from_pt=True) |
|
tokenizer = AutoTokenizer.from_pretrained("PipableAI/pipSQL1b") |
|
``` |
|
|
|
## The PipableAI team |
|
|
|
Avi Kothari, Pratham Gupta, Ritvik Aryan Kalra, Rohan Bhatial, Soham Acharya |