The classifier module is the entry point for every string that flows into GitResolve. It answers two questions: what kind of thing is this input? and what structured data can be extracted from it? All five functions are pure and synchronous — they never perform network requests, making them safe to call in hot paths or batch loops without any async overhead.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/clyrisai/gitresolve/llms.txt
Use this file to discover all available pages before exploring further.
classifyInput
Examines a raw string and returns an InputType indicating what kind of candidate input it represents. This is the quickest way to route an unknown string before deciding which heavier pipeline function to invoke.
Parameters
Any string — a URL, file path, or arbitrary text. The value is trimmed before classification.
Returns
AnInputType string literal. The classification logic runs in this order:
| Returned value | Condition |
|---|---|
"resume_file" | Input ends with .pdf, .doc, .docx, or .rtf (case-insensitive) |
"linkedin" | Parsed hostname contains linkedin.com |
"repo_url" | parseRepoUrl returns valid: true for the input |
"git_profile" | Host is a known git provider but parseRepoUrl is invalid (e.g. profile-only path) |
"portfolio" | Any other syntactically valid URL |
"unknown" | Not a valid URL and not a recognised file extension |
"resume_url" is a valid InputType value but classifyInput never returns it — the classifier has no way to distinguish a remote PDF URL from any other portfolio URL without fetching the content. Downstream logic may promote "portfolio" to "resume_url" after inspecting Content-Type.Examples
parseRepoUrl
Fully parses a GitHub, GitLab, or Bitbucket repository URL into structured fields. Strips .git suffixes and trailing slashes, handles GitLab sub-group paths, and detects pull request / issue contribution links.
Parameters
An absolute URL. Must be parseable by the WHATWG
URL constructor and hosted on github.com, gitlab.com, or bitbucket.org. The www. subdomain variant is not accepted by parseRepoUrl (use parseGitLink for that).Returns
true when parsing succeeded and data is populated; false otherwise.Present only when
valid is true.Present only when
valid is false. Possible values:| Error string | Cause |
|---|---|
"Unsupported provider" | Hostname is not github.com, gitlab.com, or bitbucket.org |
"Invalid repo path" | Fewer than two path segments after stripping .git and slashes |
"Reserved path" | First path segment is a provider-specific reserved word (e.g. explore, settings) |
"Invalid GitLab repo path" | GitLab path found a stop marker (-, tree, blob) before reaching owner/repo |
"Invalid URL" | Input is not a valid URL at all |
Behavior notes
.gitstripping —https://github.com/owner/repo.gitis treated identically tohttps://github.com/owner/repo.- GitLab sub-groups — The parser walks path segments until it hits a stop marker (
-,tree, orblob), sohttps://gitlab.com/myorg/backend/api-servicecorrectly producesowner: "myorg",repo: "api-service",fullPath: "myorg/backend/api-service". - Contribution detection differs by provider:
- GitHub:
/pull/{n}→pull_request;/issues/{n}→issue - GitLab:
/-/merge_requests/{n}→pull_request;/-/issues/{n}→issue - Bitbucket:
/pull-requests/{n}→pull_request;/issues/{n}→issue
- GitHub:
Examples
parseGitLink
Classifies a raw URL from any of the six recognised git provider hostnames into a typed ExtractedGitLink. Unlike parseRepoUrl, this function also handles profile pages, PRs, issues, and the www. subdomain variants. Note that gist.github.com is not in GIT_HOSTS, so gist subdomain URLs return null.
Parameters
An absolute URL string. Must be parseable by the WHATWG
URL constructor. Relative URLs return null.Returns
AnExtractedGitLink object, or null when the URL should be discarded.
For repo links this is the
normalized canonical URL from parseRepoUrl. For PR/issue links this is the original rawUrl to preserve the exact contribution reference. For profile links this is rawUrl as-is.Resolved from the hostname via
GIT_HOSTS.The classification result. See the table below.
The extracted owner username. For most link types this is the first path segment. For
"gist" type links (matched when the first path segment is literally gist on github.com), the second path segment is used.Present when
type is 'repo', 'pull_request', or 'issue'.Present when
type is 'pull_request' or 'issue'.Classification logic
type result | Condition |
|---|---|
"gist" | First path segment is literally gist (note: gist.github.com is not in GIT_HOSTS — URLs on that subdomain fail the isGitProviderUrl check and return null) |
"profile" | Only one path segment present; or second segment is a GitHub profile tab (repositories, stars, followers, following); or GitLab second segment is - |
"pull_request" | parseRepoUrl returns a contribution of type pull_request |
"issue" | parseRepoUrl returns a contribution of type issue |
"repo" | Two or more valid path segments that pass parseRepoUrl |
"other" | Two or more path segments that did not match any of the above |
Null cases
parseGitLink returns null (silently discards the URL) when:
- The URL cannot be parsed, or the hostname is not in
GIT_HOSTS - The last path segment ends with a static asset extension:
.png,.svg,.xml,.json,.ico,.txt,.woff,.woff2,.ttf,.css,.js,.map - The first path segment matches a reserved system path for that provider (e.g.
features,settings,explore,admin) - No path segments at all (bare host URL)
Examples
extractGitUrlsFromText
Scans a block of plain text (or raw HTML) with a regex and returns every unique git provider URL it finds. This is used internally by parseResume and extractLinksFromHtml.
Parameters
Any arbitrary string — resume text, raw HTML, a markdown document, etc.
Returns
An array of unique, normalised URL strings. Each returned value:- Has an
https://prefix (baregithub.com/...fragments are promoted) - Has trailing slashes stripped
- Appears only once (the array is deduplicated before returning)
- Matches only
github.com,gitlab.com, orbitbucket.orghostnames
The regex captures at most two path segments beyond the hostname, so
https://github.com/owner/repo is captured but https://github.com/owner/repo/blob/main/README.md is truncated to https://github.com/owner/repo. This is intentional — deeper paths are not useful for profile resolution and create noise.Examples
isGitProviderUrl
Returns true if and only if the URL’s hostname is one of the six recognised git provider hostnames. Useful as a fast filter before calling heavier parsing functions.
Parameters
An absolute URL string. Invalid URLs return
false rather than throwing.Returns
true if the hostname is in GIT_HOSTS, false otherwise.
The six recognised hostnames are:
| Hostname | Provider |
|---|---|
github.com | github |
www.github.com | github |
gitlab.com | gitlab |
www.gitlab.com | gitlab |
bitbucket.org | bitbucket |
www.bitbucket.org | bitbucket |
Examples
GIT_HOSTS
A constant lookup map from hostname string to GitProvider. Exported for cases where you need to check or iterate the recognised hosts without importing isGitProviderUrl.