Before GitResolve fetches a single URL or opens a single file, it runs every input through a classification step. This step determines which processing pipeline to invoke — portfolio scraping, PDF parsing, direct profile resolution, or a deliberate skip — so that downstream logic always knows exactly what kind of source it is working with. Getting classification right is essential: sending a resume file path into the portfolio scraper, or treating a bare GitHub profile as a repository URL, would produce empty or incorrect results.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/clyrisai/gitresolve/llms.txt
Use this file to discover all available pages before exploring further.
classifyInput resolves that ambiguity with a fast, deterministic decision before any network or file I/O takes place.
InputType reference
Every input resolves to one of sevenInputType values. The table below lists each value, what it represents, and a concrete example.
InputType | Meaning | Example input |
|---|---|---|
repo_url | A direct link to a specific repository on GitHub, GitLab, or Bitbucket | https://github.com/torvalds/linux |
git_profile | A profile page on a supported git host (no repo path) | https://github.com/torvalds |
portfolio | Any other fully-qualified URL — personal sites, project pages, etc. | https://janedoe.dev |
resume_file | A local file path ending in .pdf, .doc, .docx, or .rtf | ./resumes/jane_doe.pdf |
resume_url | A URL that points to a hosted resume document | https://cdn.example.com/resume.pdf |
linkedin | Any URL whose hostname contains linkedin.com | https://www.linkedin.com/in/janedoe |
unknown | Anything that cannot be parsed as a URL and is not a recognised file extension | github.com/janedoe (no scheme) |
resume_url is defined in the InputType union but is never returned by classifyInput. A URL ending in .pdf hosted on a non-git domain resolves to 'portfolio' through the classification algorithm. The CLI assigns 'resume_url' as the sourceType after it has downloaded a remote PDF and is about to hand it off for parsing — this assignment happens outside classifyInput entirely.Classification decision flow
classifyInput applies rules in strict order and returns as soon as a match is found. Step 7 (portfolio) acts as a catch-all for any valid URL that did not match an earlier rule — the only step that returns without a positive match is step 3, which returns unknown when the URL constructor throws.
Trim whitespace
The raw input string is trimmed of leading and trailing whitespace. All subsequent checks operate on this cleaned value.
File extension check → resume_file
If the trimmed string ends with
.pdf, .doc, .docx, or .rtf (case-insensitive), the input is classified as resume_file immediately. No URL parsing is attempted.URL parse attempt — failure → unknown
The string is passed to the
URL constructor. If parsing throws, the input cannot be a valid web address and is classified as unknown.classifyInput does not prepend https:// automatically. A bare hostname like github.com/janedoe fails URL parsing and returns unknown. Always pass fully-qualified URLs with a scheme.LinkedIn hostname check → linkedin
If the parsed URL’s hostname contains
linkedin.com, the input is classified as linkedin. Processing stops here — GitResolve does not attempt to scrape or resolve LinkedIn URLs (see LinkedIn handling below).Repo URL validation → repo_url
The URL is passed to
parseRepoUrl(). If it returns valid: true — meaning it has a recognised git hostname and a valid owner/repo path structure — the input is classified as repo_url.Known git host with path segments → git_profile
If the hostname is A bare root URL with no path segments (
github.com, www.github.com, gitlab.com, www.gitlab.com, bitbucket.org, or www.bitbucket.org, and the URL path contains at least one segment, the input is a profile page.https://github.com) returns unknown because there is no username to extract.What GitResolve does for each type
Classification determines which pipeline runs next:InputType | Processing strategy |
|---|---|
repo_url | Owner is extracted directly from the URL. knownOwnerProfile is passed to the disambiguator, yielding confidence: 'high' with no scraping needed. |
git_profile | Owner is extracted directly from the URL path. Same high-confidence bypass as repo_url. |
portfolio | Page HTML is fetched via scrapePortfolio(), all href attributes and inline git URLs are extracted, then disambiguation runs on the full link set. |
resume_file | The file is read with parseResume(), which runs two extraction passes: plain-text extraction via unpdf and hyperlink annotation extraction from PDF metadata. Disambiguation then runs on the combined link set. |
resume_url | Set by the CLI after downloading a remote PDF — not produced by classifyInput. The downloaded file is then processed the same way as resume_file. |
linkedin | Flagged and skipped — no request is made (see below). |
unknown | An error is attached to the result and processing is skipped. |
LinkedIn: intentionally not resolved
WhenclassifyInput returns linkedin, GitResolve records the type and moves on without issuing any request. LinkedIn’s terms of service prohibit automated scraping, and their login walls make reliable extraction impractical. If a candidate’s LinkedIn URL is the only input available, the resolver returns a result with confidence: 'none' and a warning indicating the source was skipped.
Code example
Disambiguation
How GitResolve determines which GitHub identity owns the resolved links
Result Structure
The full shape of ResolverResult and AggregatedResult