Spaces:

reisarod
/

gradio

Runtime error

File size: 11,209 Bytes

5fae594

# tldts - Blazing Fast URL Parsing

`tldts` is a JavaScript library to extract hostnames, domains, public suffixes, top-level domains and subdomains from URLs.

**Features**:

1. Tuned for **performance** (order of 0.1 to 1 μs per input)
2. Handles both URLs and hostnames
3. Full Unicode/IDNA support
4. Support parsing email addresses
5. Detect IPv4 and IPv6 addresses
6. Continuously updated version of the public suffix list
7. **TypeScript**, ships with `umd`, `esm`, `cjs` bundles and _type definitions_
8. Small bundles and small memory footprint
9. Battle tested: full test coverage and production use

# Install

```bash
npm install --save tldts
```

# Usage

Using the command-line interface:

```js
$ npx tldts 'http://www.writethedocs.org/conf/eu/2017/'
{
  "domain": "writethedocs.org",
  "domainWithoutSuffix": "writethedocs",
  "hostname": "www.writethedocs.org",
  "isIcann": true,
  "isIp": false,
  "isPrivate": false,
  "publicSuffix": "org",
  "subdomain": "www"
}
```

Programmatically:

```js
const { parse } = require('tldts');

// Retrieving hostname related informations of a given URL
parse('http://www.writethedocs.org/conf/eu/2017/');
// { domain: 'writethedocs.org',
//   domainWithoutSuffix: 'writethedocs',
//   hostname: 'www.writethedocs.org',
//   isIcann: true,
//   isIp: false,
//   isPrivate: false,
//   publicSuffix: 'org',
//   subdomain: 'www' }
```

Modern _ES6 modules import_ is also supported:

```js
import { parse } from 'tldts';
```

Alternatively, you can try it _directly in your browser_ here: https://npm.runkit.com/tldts

# API

- `tldts.parse(url | hostname, options)`
- `tldts.getHostname(url | hostname, options)`
- `tldts.getDomain(url | hostname, options)`
- `tldts.getPublicSuffix(url | hostname, options)`
- `tldts.getSubdomain(url, | hostname, options)`
- `tldts.getDomainWithoutSuffix(url | hostname, options)`

The behavior of `tldts` can be customized using an `options` argument for all
the functions exposed as part of the public API. This is useful to both change
the behavior of the library as well as fine-tune the performance depending on
your inputs.

```js
{
  // Use suffixes from ICANN section (default: true)
  allowIcannDomains: boolean;
  // Use suffixes from Private section (default: false)
  allowPrivateDomains: boolean;
  // Extract and validate hostname (default: true)
  // When set to `false`, inputs will be considered valid hostnames.
  extractHostname: boolean;
  // Validate hostnames after parsing (default: true)
  // If a hostname is not valid, not further processing is performed. When set
  // to `false`, inputs to the library will be considered valid and parsing will
  // proceed regardless.
  validateHostname: boolean;
  // Perform IP address detection (default: true).
  detectIp: boolean;
  // Assume that both URLs and hostnames can be given as input (default: true)
  // If set to `false` we assume only URLs will be given as input, which
  // speed-ups processing.
  mixedInputs: boolean;
  // Specifies extra valid suffixes (default: null)
  validHosts: string[] | null;
}
```

The `parse` method returns handy **properties about a URL or a hostname**.

```js
const tldts = require('tldts');

tldts.parse('https://spark-public.s3.amazonaws.com/dataanalysis/loansData.csv');
// { domain: 'amazonaws.com',
//   domainWithoutSuffix: 'amazonaws',
//   hostname: 'spark-public.s3.amazonaws.com',
//   isIcann: true,
//   isIp: false,
//   isPrivate: false,
//   publicSuffix: 'com',
//   subdomain: 'spark-public.s3' }

tldts.parse(
  'https://spark-public.s3.amazonaws.com/dataanalysis/loansData.csv',
  { allowPrivateDomains: true },
);
// { domain: 'spark-public.s3.amazonaws.com',
//   domainWithoutSuffix: 'spark-public',
//   hostname: 'spark-public.s3.amazonaws.com',
//   isIcann: false,
//   isIp: false,
//   isPrivate: true,
//   publicSuffix: 's3.amazonaws.com',
//   subdomain: '' }

tldts.parse('gopher://domain.unknown/');
// { domain: 'domain.unknown',
//   domainWithoutSuffix: 'domain',
//   hostname: 'domain.unknown',
//   isIcann: false,
//   isIp: false,
//   isPrivate: true,
//   publicSuffix: 'unknown',
//   subdomain: '' }

tldts.parse('https://192.168.0.0'); // IPv4
// { domain: null,
//   domainWithoutSuffix: null,
//   hostname: '192.168.0.0',
//   isIcann: null,
//   isIp: true,
//   isPrivate: null,
//   publicSuffix: null,
//   subdomain: null }

tldts.parse('https://[::1]'); // IPv6
// { domain: null,
//   domainWithoutSuffix: null,
//   hostname: '::1',
//   isIcann: null,
//   isIp: true,
//   isPrivate: null,
//   publicSuffix: null,
//   subdomain: null }

tldts.parse('[email protected]'); // email
// { domain: 'emailprovider.co.uk',
//   domainWithoutSuffix: 'emailprovider',
//   hostname: 'emailprovider.co.uk',
//   isIcann: true,
//   isIp: false,
//   isPrivate: false,
//   publicSuffix: 'co.uk',
//   subdomain: '' }
```

| Property Name         | Type   | Description                                     |
| :-------------------- | :----- | :---------------------------------------------- |
| `hostname`            | `str`  | `hostname` of the input extracted automatically |
| `domain`              | `str`  | Domain (tld + sld)                              |
| `domainWithoutSuffix` | `str`  | Domain without public suffix                    |
| `subdomain`           | `str`  | Sub domain (what comes after `domain`)          |
| `publicSuffix`        | `str`  | Public Suffix (tld) of `hostname`               |
| `isIcann`             | `bool` | Does TLD come from ICANN part of the list       |
| `isPrivate`           | `bool` | Does TLD come from Private part of the list     |
| `isIP`                | `bool` | Is `hostname` an IP address?                    |

## Single purpose methods

These methods are shorthands if you want to retrieve only a single value (and
will perform better than `parse` because less work will be needed).

### getHostname(url | hostname, options?)

Returns the hostname from a given string.

```javascript
const { getHostname } = require('tldts');

getHostname('google.com'); // returns `google.com`
getHostname('fr.google.com'); // returns `fr.google.com`
getHostname('fr.google.google'); // returns `fr.google.google`
getHostname('foo.google.co.uk'); // returns `foo.google.co.uk`
getHostname('t.co'); // returns `t.co`
getHostname('fr.t.co'); // returns `fr.t.co`
getHostname(
  'https://user:[email protected]:8080/some/path?and&query#hash',
); // returns `example.co.uk`
```

### getDomain(url | hostname, options?)

Returns the fully qualified domain from a given string.

```javascript
const { getDomain } = require('tldts');

getDomain('google.com'); // returns `google.com`
getDomain('fr.google.com'); // returns `google.com`
getDomain('fr.google.google'); // returns `google.google`
getDomain('foo.google.co.uk'); // returns `google.co.uk`
getDomain('t.co'); // returns `t.co`
getDomain('fr.t.co'); // returns `t.co`
getDomain('https://user:[email protected]:8080/some/path?and&query#hash'); // returns `example.co.uk`
```

### getDomainWithoutSuffix(url | hostname, options?)

Returns the domain (as returned by `getDomain(...)`) without the public suffix part.

```javascript
const { getDomainWithoutSuffix } = require('tldts');

getDomainWithoutSuffix('google.com'); // returns `google`
getDomainWithoutSuffix('fr.google.com'); // returns `google`
getDomainWithoutSuffix('fr.google.google'); // returns `google`
getDomainWithoutSuffix('foo.google.co.uk'); // returns `google`
getDomainWithoutSuffix('t.co'); // returns `t`
getDomainWithoutSuffix('fr.t.co'); // returns `t`
getDomainWithoutSuffix(
  'https://user:[email protected]:8080/some/path?and&query#hash',
); // returns `example`
```

### getSubdomain(url | hostname, options?)

Returns the complete subdomain for a given string.

```javascript
const { getSubdomain } = require('tldts');

getSubdomain('google.com'); // returns ``
getSubdomain('fr.google.com'); // returns `fr`
getSubdomain('google.co.uk'); // returns ``
getSubdomain('foo.google.co.uk'); // returns `foo`
getSubdomain('moar.foo.google.co.uk'); // returns `moar.foo`
getSubdomain('t.co'); // returns ``
getSubdomain('fr.t.co'); // returns `fr`
getSubdomain(
  'https://user:[email protected]:443/some/path?and&query#hash',
); // returns `secure`
```

### getPublicSuffix(url | hostname, options?)

Returns the [public suffix][] for a given string.

```javascript
const { getPublicSuffix } = require('tldts');

getPublicSuffix('google.com'); // returns `com`
getPublicSuffix('fr.google.com'); // returns `com`
getPublicSuffix('google.co.uk'); // returns `co.uk`
getPublicSuffix('s3.amazonaws.com'); // returns `com`
getPublicSuffix('s3.amazonaws.com', { allowPrivateDomains: true }); // returns `s3.amazonaws.com`
getPublicSuffix('tld.is.unknown'); // returns `unknown`
```

# Troubleshooting

## Retrieving subdomain of `localhost` and custom hostnames

`tldts` methods `getDomain` and `getSubdomain` are designed to **work only with _known and valid_ TLDs**.
This way, you can trust what a domain is.

`localhost` is a valid hostname but not a TLD. You can pass additional options to each method exposed by `tldts`:

```js
const tldts = require('tldts');

tldts.getDomain('localhost'); // returns null
tldts.getSubdomain('vhost.localhost'); // returns null

tldts.getDomain('localhost', { validHosts: ['localhost'] }); // returns 'localhost'
tldts.getSubdomain('vhost.localhost', { validHosts: ['localhost'] }); // returns 'vhost'
```

## Updating the TLDs List

`tldts` made the opinionated choice of shipping with a list of suffixes directly
in its bundle. There is currently no mechanism to update the lists yourself, but
we make sure that the version shipped is always up-to-date.

If you keep `tldts` updated, the lists should be up-to-date as well!

# Performance

`tldts` is the _fastest JavaScript library_ available for parsing hostnames. It is able to parse _millions of inputs per second_ (typically 2-3M depending on your hardware and inputs). It also offers granular options to fine-tune the behavior and performance of the library depending on the kind of inputs you are dealing with (e.g.: if you know you only manipulate valid hostnames you can disable the hostname extraction step with `{ extractHostname: false }`).

Please see [this detailed comparison](./comparison/comparison.md) with other available libraries.

## Contributors

`tldts` is based upon the excellent `tld.js` library and would not exist without
the many contributors who worked on the project:
<a href="graphs/contributors"><img src="https://opencollective.com/tldjs/contributors.svg?width=890" /></a>

This project would not be possible without the amazing Mozilla's
[public suffix list][]. Thank you for your hard work!

# License

[MIT License](LICENSE).

[badge-ci]: https://secure.travis-ci.org/remusao/tldts.svg?branch=master
[badge-downloads]: https://img.shields.io/npm/dm/tldts.svg
[public suffix list]: https://publicsuffix.org/list/
[list the recent changes]: https://github.com/publicsuffix/list/commits/master
[changes Atom Feed]: https://github.com/publicsuffix/list/commits/master.atom
[public suffix]: https://publicsuffix.org/learn/