Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/clyrisai/gitresolve/llms.txt

Use this file to discover all available pages before exploring further.

Before GitResolve fetches a single URL or opens a single file, it runs every input through a classification step. This step determines which processing pipeline to invoke — portfolio scraping, PDF parsing, direct profile resolution, or a deliberate skip — so that downstream logic always knows exactly what kind of source it is working with. Getting classification right is essential: sending a resume file path into the portfolio scraper, or treating a bare GitHub profile as a repository URL, would produce empty or incorrect results. classifyInput resolves that ambiguity with a fast, deterministic decision before any network or file I/O takes place.

InputType reference

Every input resolves to one of seven InputType values. The table below lists each value, what it represents, and a concrete example.
InputTypeMeaningExample input
repo_urlA direct link to a specific repository on GitHub, GitLab, or Bitbuckethttps://github.com/torvalds/linux
git_profileA profile page on a supported git host (no repo path)https://github.com/torvalds
portfolioAny other fully-qualified URL — personal sites, project pages, etc.https://janedoe.dev
resume_fileA local file path ending in .pdf, .doc, .docx, or .rtf./resumes/jane_doe.pdf
resume_urlA URL that points to a hosted resume documenthttps://cdn.example.com/resume.pdf
linkedinAny URL whose hostname contains linkedin.comhttps://www.linkedin.com/in/janedoe
unknownAnything that cannot be parsed as a URL and is not a recognised file extensiongithub.com/janedoe (no scheme)
resume_url is defined in the InputType union but is never returned by classifyInput. A URL ending in .pdf hosted on a non-git domain resolves to 'portfolio' through the classification algorithm. The CLI assigns 'resume_url' as the sourceType after it has downloaded a remote PDF and is about to hand it off for parsing — this assignment happens outside classifyInput entirely.

Classification decision flow

classifyInput applies rules in strict order and returns as soon as a match is found. Step 7 (portfolio) acts as a catch-all for any valid URL that did not match an earlier rule — the only step that returns without a positive match is step 3, which returns unknown when the URL constructor throws.
1

Trim whitespace

The raw input string is trimmed of leading and trailing whitespace. All subsequent checks operate on this cleaned value.
2

File extension check → resume_file

If the trimmed string ends with .pdf, .doc, .docx, or .rtf (case-insensitive), the input is classified as resume_file immediately. No URL parsing is attempted.
// All of these → 'resume_file'
classifyInput("./cv/jane_doe.pdf")     // → 'resume_file'
classifyInput("/tmp/Resume.DOCX")      // → 'resume_file'
classifyInput("C:\\Users\\jane.rtf")   // → 'resume_file'
3

URL parse attempt — failure → unknown

The string is passed to the URL constructor. If parsing throws, the input cannot be a valid web address and is classified as unknown.
classifyInput("github.com/janedoe")    // → 'unknown'  (no https:// scheme)
classifyInput("not a url at all")      // → 'unknown'
classifyInput does not prepend https:// automatically. A bare hostname like github.com/janedoe fails URL parsing and returns unknown. Always pass fully-qualified URLs with a scheme.
4

LinkedIn hostname check → linkedin

If the parsed URL’s hostname contains linkedin.com, the input is classified as linkedin. Processing stops here — GitResolve does not attempt to scrape or resolve LinkedIn URLs (see LinkedIn handling below).
classifyInput("https://www.linkedin.com/in/janedoe") // → 'linkedin'
classifyInput("https://linkedin.com/company/acme")   // → 'linkedin'
5

Repo URL validation → repo_url

The URL is passed to parseRepoUrl(). If it returns valid: true — meaning it has a recognised git hostname and a valid owner/repo path structure — the input is classified as repo_url.
classifyInput("https://github.com/torvalds/linux")           // → 'repo_url'
classifyInput("https://gitlab.com/inkscape/inkscape")        // → 'repo_url'
classifyInput("https://bitbucket.org/atlassian/localstack")  // → 'repo_url'
// With contribution paths too:
classifyInput("https://github.com/torvalds/linux/pull/42")   // → 'repo_url'
6

Known git host with path segments → git_profile

If the hostname is github.com, www.github.com, gitlab.com, www.gitlab.com, bitbucket.org, or www.bitbucket.org, and the URL path contains at least one segment, the input is a profile page.
classifyInput("https://github.com/torvalds")       // → 'git_profile'
classifyInput("https://gitlab.com/gitlab-org")     // → 'git_profile'
classifyInput("https://bitbucket.org/atlassian")   // → 'git_profile'
A bare root URL with no path segments (https://github.com) returns unknown because there is no username to extract.
7

Any other valid URL → portfolio

If the URL passed all previous checks without matching, it is classified as portfolio. This covers personal websites, project homepages, hosted slides, and any other web page that might contain git links.
classifyInput("https://janedoe.dev")                      // → 'portfolio'
classifyInput("https://janesmith.io/projects")            // → 'portfolio'
classifyInput("https://cdn.example.com/resume.pdf")       // → 'portfolio'

What GitResolve does for each type

Classification determines which pipeline runs next:
InputTypeProcessing strategy
repo_urlOwner is extracted directly from the URL. knownOwnerProfile is passed to the disambiguator, yielding confidence: 'high' with no scraping needed.
git_profileOwner is extracted directly from the URL path. Same high-confidence bypass as repo_url.
portfolioPage HTML is fetched via scrapePortfolio(), all href attributes and inline git URLs are extracted, then disambiguation runs on the full link set.
resume_fileThe file is read with parseResume(), which runs two extraction passes: plain-text extraction via unpdf and hyperlink annotation extraction from PDF metadata. Disambiguation then runs on the combined link set.
resume_urlSet by the CLI after downloading a remote PDF — not produced by classifyInput. The downloaded file is then processed the same way as resume_file.
linkedinFlagged and skipped — no request is made (see below).
unknownAn error is attached to the result and processing is skipped.

LinkedIn: intentionally not resolved

When classifyInput returns linkedin, GitResolve records the type and moves on without issuing any request. LinkedIn’s terms of service prohibit automated scraping, and their login walls make reliable extraction impractical. If a candidate’s LinkedIn URL is the only input available, the resolver returns a result with confidence: 'none' and a warning indicating the source was skipped.
If you need to connect a LinkedIn profile to a GitHub identity, ask candidates to include their GitHub URL directly on their portfolio or resume. GitResolve will pick it up automatically during scraping or PDF parsing.

Code example

import { classifyInput } from "@clyrisai/gitresolve";

const inputs = [
  "./jane_doe_resume.pdf",
  "https://github.com/janedoe",
  "https://github.com/janedoe/my-project",
  "https://janedoe.dev",
  "https://www.linkedin.com/in/janedoe",
  "github.com/janedoe",                    // missing scheme
];

for (const input of inputs) {
  console.log(input, "→", classifyInput(input));
}

// ./jane_doe_resume.pdf               → resume_file
// https://github.com/janedoe          → git_profile
// https://github.com/janedoe/my-project → repo_url
// https://janedoe.dev                 → portfolio
// https://www.linkedin.com/in/janedoe → linkedin
// github.com/janedoe                  → unknown

Disambiguation

How GitResolve determines which GitHub identity owns the resolved links

Result Structure

The full shape of ResolverResult and AggregatedResult

Build docs developers (and LLMs) love