Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/clyrisai/gitresolve/llms.txt

Use this file to discover all available pages before exploring further.

The classifier module is the entry point for every string that flows into GitResolve. It answers two questions: what kind of thing is this input? and what structured data can be extracted from it? All five functions are pure and synchronous — they never perform network requests, making them safe to call in hot paths or batch loops without any async overhead.

classifyInput

Examines a raw string and returns an InputType indicating what kind of candidate input it represents. This is the quickest way to route an unknown string before deciding which heavier pipeline function to invoke.
function classifyInput(input: string): InputType

Parameters

input
string
required
Any string — a URL, file path, or arbitrary text. The value is trimmed before classification.

Returns

An InputType string literal. The classification logic runs in this order:
Returned valueCondition
"resume_file"Input ends with .pdf, .doc, .docx, or .rtf (case-insensitive)
"linkedin"Parsed hostname contains linkedin.com
"repo_url"parseRepoUrl returns valid: true for the input
"git_profile"Host is a known git provider but parseRepoUrl is invalid (e.g. profile-only path)
"portfolio"Any other syntactically valid URL
"unknown"Not a valid URL and not a recognised file extension
"resume_url" is a valid InputType value but classifyInput never returns it — the classifier has no way to distinguish a remote PDF URL from any other portfolio URL without fetching the content. Downstream logic may promote "portfolio" to "resume_url" after inspecting Content-Type.

Examples

import { classifyInput } from '@clyrisai/gitresolve';

classifyInput('https://github.com/torvalds/linux');    // 'repo_url'
classifyInput('https://github.com/torvalds');          // 'git_profile'
classifyInput('https://janedoe.dev');                  // 'portfolio'
classifyInput('./resumes/janedoe.pdf');                // 'resume_file'
classifyInput('janedoe.docx');                         // 'resume_file'
classifyInput('https://linkedin.com/in/janedoe');      // 'linkedin'
classifyInput('not a url at all');                     // 'unknown'

parseRepoUrl

Fully parses a GitHub, GitLab, or Bitbucket repository URL into structured fields. Strips .git suffixes and trailing slashes, handles GitLab sub-group paths, and detects pull request / issue contribution links.
function parseRepoUrl(repoUrl: string): {
  valid: boolean;
  data?: ParsedRepo;
  error?: string;
}

Parameters

repoUrl
string
required
An absolute URL. Must be parseable by the WHATWG URL constructor and hosted on github.com, gitlab.com, or bitbucket.org. The www. subdomain variant is not accepted by parseRepoUrl (use parseGitLink for that).

Returns

valid
boolean
true when parsing succeeded and data is populated; false otherwise.
data
ParsedRepo
Present only when valid is true.
error
string
Present only when valid is false. Possible values:
Error stringCause
"Unsupported provider"Hostname is not github.com, gitlab.com, or bitbucket.org
"Invalid repo path"Fewer than two path segments after stripping .git and slashes
"Reserved path"First path segment is a provider-specific reserved word (e.g. explore, settings)
"Invalid GitLab repo path"GitLab path found a stop marker (-, tree, blob) before reaching owner/repo
"Invalid URL"Input is not a valid URL at all

Behavior notes

  • .git strippinghttps://github.com/owner/repo.git is treated identically to https://github.com/owner/repo.
  • GitLab sub-groups — The parser walks path segments until it hits a stop marker (-, tree, or blob), so https://gitlab.com/myorg/backend/api-service correctly produces owner: "myorg", repo: "api-service", fullPath: "myorg/backend/api-service".
  • Contribution detection differs by provider:
    • GitHub: /pull/{n}pull_request; /issues/{n}issue
    • GitLab: /-/merge_requests/{n}pull_request; /-/issues/{n}issue
    • Bitbucket: /pull-requests/{n}pull_request; /issues/{n}issue

Examples

import { parseRepoUrl } from '@clyrisai/gitresolve';

const result = parseRepoUrl('https://github.com/owner/my-repo.git');
// {
//   valid: true,
//   data: {
//     provider: 'github',
//     host: 'github.com',
//     owner: 'owner',
//     repo: 'my-repo',
//     fullPath: 'owner/my-repo',
//     normalized: 'https://github.com/owner/my-repo',
//   }
// }

Classifies a raw URL from any of the six recognised git provider hostnames into a typed ExtractedGitLink. Unlike parseRepoUrl, this function also handles profile pages, PRs, issues, and the www. subdomain variants. Note that gist.github.com is not in GIT_HOSTS, so gist subdomain URLs return null.
function parseGitLink(rawUrl: string): ExtractedGitLink | null

Parameters

rawUrl
string
required
An absolute URL string. Must be parseable by the WHATWG URL constructor. Relative URLs return null.

Returns

An ExtractedGitLink object, or null when the URL should be discarded.
url
string
For repo links this is the normalized canonical URL from parseRepoUrl. For PR/issue links this is the original rawUrl to preserve the exact contribution reference. For profile links this is rawUrl as-is.
provider
'github' | 'gitlab' | 'bitbucket'
Resolved from the hostname via GIT_HOSTS.
type
GitLinkType
The classification result. See the table below.
username
string
The extracted owner username. For most link types this is the first path segment. For "gist" type links (matched when the first path segment is literally gist on github.com), the second path segment is used.
repo
string | undefined
Present when type is 'repo', 'pull_request', or 'issue'.
number
string | undefined
Present when type is 'pull_request' or 'issue'.

Classification logic

type resultCondition
"gist"First path segment is literally gist (note: gist.github.com is not in GIT_HOSTS — URLs on that subdomain fail the isGitProviderUrl check and return null)
"profile"Only one path segment present; or second segment is a GitHub profile tab (repositories, stars, followers, following); or GitLab second segment is -
"pull_request"parseRepoUrl returns a contribution of type pull_request
"issue"parseRepoUrl returns a contribution of type issue
"repo"Two or more valid path segments that pass parseRepoUrl
"other"Two or more path segments that did not match any of the above

Null cases

parseGitLink returns null (silently discards the URL) when:
  • The URL cannot be parsed, or the hostname is not in GIT_HOSTS
  • The last path segment ends with a static asset extension: .png, .svg, .xml, .json, .ico, .txt, .woff, .woff2, .ttf, .css, .js, .map
  • The first path segment matches a reserved system path for that provider (e.g. features, settings, explore, admin)
  • No path segments at all (bare host URL)

Examples

import { parseGitLink } from '@clyrisai/gitresolve';

// Profile
parseGitLink('https://github.com/torvalds');
// { url: 'https://github.com/torvalds', provider: 'github', type: 'profile', username: 'torvalds' }

// Repo
parseGitLink('https://github.com/vercel/next.js');
// { url: 'https://github.com/vercel/next.js', provider: 'github', type: 'repo', username: 'vercel', repo: 'next.js' }

// gist.github.com is NOT in GIT_HOSTS — returns null
parseGitLink('https://gist.github.com/sindresorhus/abc123'); // null

// Pull request
parseGitLink('https://github.com/facebook/react/pull/28987');
// { url: '...', provider: 'github', type: 'pull_request', username: 'facebook', repo: 'react', number: '28987' }

// Static asset — silently discarded
parseGitLink('https://github.com/some/repo/logo.png'); // null

// Non-git host — silently discarded
parseGitLink('https://example.com/user/repo');          // null

// Reserved path — silently discarded
parseGitLink('https://github.com/settings/profile');    // null

extractGitUrlsFromText

Scans a block of plain text (or raw HTML) with a regex and returns every unique git provider URL it finds. This is used internally by parseResume and extractLinksFromHtml.
function extractGitUrlsFromText(text: string): string[]

Parameters

text
string
required
Any arbitrary string — resume text, raw HTML, a markdown document, etc.

Returns

An array of unique, normalised URL strings. Each returned value:
  • Has an https:// prefix (bare github.com/... fragments are promoted)
  • Has trailing slashes stripped
  • Appears only once (the array is deduplicated before returning)
  • Matches only github.com, gitlab.com, or bitbucket.org hostnames
The regex captures at most two path segments beyond the hostname, so https://github.com/owner/repo is captured but https://github.com/owner/repo/blob/main/README.md is truncated to https://github.com/owner/repo. This is intentional — deeper paths are not useful for profile resolution and create noise.

Examples

import { extractGitUrlsFromText } from '@clyrisai/gitresolve';

const text = `
  Check out my projects at github.com/janedoe and
  https://gitlab.com/janedoe/api-service.
  I also contributed to https://github.com/facebook/react/pull/42.
`;

extractGitUrlsFromText(text);
// [
//   'https://github.com/janedoe',
//   'https://gitlab.com/janedoe/api-service',
//   'https://github.com/facebook/react',  // truncated at 2 segments
// ]

isGitProviderUrl

Returns true if and only if the URL’s hostname is one of the six recognised git provider hostnames. Useful as a fast filter before calling heavier parsing functions.
function isGitProviderUrl(link: string): boolean

Parameters

An absolute URL string. Invalid URLs return false rather than throwing.

Returns

true if the hostname is in GIT_HOSTS, false otherwise. The six recognised hostnames are:
HostnameProvider
github.comgithub
www.github.comgithub
gitlab.comgitlab
www.gitlab.comgitlab
bitbucket.orgbitbucket
www.bitbucket.orgbitbucket

Examples

import { isGitProviderUrl } from '@clyrisai/gitresolve';

isGitProviderUrl('https://github.com/torvalds/linux');  // true
isGitProviderUrl('https://www.github.com/torvalds');    // true
isGitProviderUrl('https://bitbucket.org/owner/repo');   // true
isGitProviderUrl('https://example.com/owner/repo');     // false
isGitProviderUrl('not-a-url');                          // false

GIT_HOSTS

A constant lookup map from hostname string to GitProvider. Exported for cases where you need to check or iterate the recognised hosts without importing isGitProviderUrl.
import { GIT_HOSTS } from '@clyrisai/gitresolve';

// Type: Record<string, GitProvider>
// Value:
{
  'github.com':         'github',
  'www.github.com':     'github',
  'gitlab.com':         'gitlab',
  'www.gitlab.com':     'gitlab',
  'bitbucket.org':      'bitbucket',
  'www.bitbucket.org':  'bitbucket',
}

Usage

import { GIT_HOSTS } from '@clyrisai/gitresolve';

const hostname = new URL(someUrl).hostname.toLowerCase();
const provider = GIT_HOSTS[hostname]; // 'github' | 'gitlab' | 'bitbucket' | undefined

if (provider) {
  console.log(`This is a ${provider} URL`);
}

Build docs developers (and LLMs) love