Recruiters were losing 4 to 6 hours per role just screening. That’s nearly half the week spent chasing names instead of actually talking to people. It was a business model that didn’t scale; if we wanted more clients, we had to hire more people. There was no path to scale through efficiency alone.
Here’s how GoGloby broke that cycle by building a semantic ML pipeline—and the reality of what it took to get it right.
Achievements After Partnering With GoGloby
By automating the heaviest lifting of the recruitment funnel, the GoGloby partnership turned a manual, high-friction screening process into a precision operation. The shift not only slashed costs and timelines but also standardized candidate evaluation, allowing the hiring team to focus on top-tier talent while achieving a massive leap in operational ROI.
| Metric | Result |
|---|---|
| Screening Time Per Role | ↓ 85% |
| Candidate Evaluation Consistency | 60% → 91% agreement |
| False Positive Rate on Shortlists | 35% → 12% |
| Time to Shortlist | ↓ 80% |
| Cost Per Candidate Scored | $3.20 → $0.02 |
| Job Postings Handled Simultaneously | 3x baseline |
| ROI | 340% within 6 months |
The Situation at a Glance
By replacing labor-intensive manual screening with a high-precision ML pipeline, the organization reclaimed nearly half of its recruiting capacity while maintaining a lean infrastructure footprint. This shift allowed the team to scale operations instantly, delivering a 340% return on investment in just 6 months.
| Industry | Talent Acquisition |
| Timeline | 6 months |
| Infrastructure cost | $400/month |
| Core problem | Manual candidate screening consuming 40% of recruiter time, with no path to scale |
| Solution | End-to-end semantic ML pipeline with configurable weighted scoring |
| ROI | 340% within 6 months |
The Problem
GoGloby’s traditional recruiting process relied entirely on human effort for candidate evaluation. Every new role required recruiters to spend 4–6 hours searching databases, reviewing profiles, and forming subjective assessments before a single meaningful conversation took place.
The process had 4 compounding failure modes.
1. Time Drain
Screening consumed 40% of total recruiter time. With multiple active roles running simultaneously, this became a constant bottleneck — the majority of productive hours spent on work that produced no direct signal.
2. Inconsistent Evaluation
Without a standardized scoring method, 2 recruiters evaluating the same candidate would agree only 60% of the time. Subjective judgment meant different clients received different quality levels depending on who handled their role.
3. No Path to Scale
Every additional job posting required proportional headcount. Growth was linear — and so was cost. Process automation sat below 5%. The business had a hard ceiling that no amount of hiring could move.
4. Missed Matches
Keyword searches couldn’t surface candidates whose profiles were semantically aligned but lexically different. A candidate experienced in “distributed systems” was invisible to a search for “high-availability backend architecture” — despite being exactly the right person for the role.
Engineering the Solution
The core insight was that candidate-job matching is fundamentally a semantic similarity problem. 2 phrases can describe identical expertise in entirely different words and no keyword filter catches that.
The GoGloby Applied AI Engineering team built an end-to-end ML pipeline that converts both candidate profiles and job requirements into dense vector representations, then ranks candidates by their geometric proximity in embedding space.
The pipeline runs on PostgreSQL with pgvector — no exotic infrastructure, no six-figure ML platform. The scoring engine processes hundreds of candidates per role in seconds, with configurable weights for technical skills, experience level, cultural fit, and location alignment depending on client priorities.
The system was designed deliberately to augment recruiters, not replace their judgment. It surfaces ranked shortlists with score breakdowns so recruiters can immediately focus on relationship building and client context — the work that actually requires human intelligence.
The Processing Pipeline
Raw data from CRM systems, job boards, and document repositories flows through 6 sequential stages:
1. Raw Data Ingestion
PDFs, Word docs, plain text, HTML — multiple formats unified into a single processing queue via Apache Airflow.
2. Text Extraction & Cleaning
Mistral-7B normalizes and cleans unstructured text at 1,000 documents per hour with 94% extraction accuracy.
3. PII Obfuscation
Tokenization, masking, and differential privacy applied before any embedding generation. Compliance is built in, not retrofitted.
4. Embedding Generation
OpenAI text-embedding-3-large produces 3,072-dimensional vectors per candidate profile and job requirement.
5. Vector Storage
PostgreSQL with pgvector and IVFFlat indexing enables sub-linear nearest-neighbor search across 1M+ candidate profiles.
6. Weighted Scoring
Multi-dimensional cosine similarity with client-configurable weights across four criteria. Top candidates returned in 200ms.
Key Engineering Decisions
3 decisions shaped the architecture’s performance and cost profile. Each involved a real trade-off.
1. Text Processing Model
The team evaluated 3 options for handling unstructured resume parsing at 1,000 documents per hour.
| Model | Parameters | Monthly Cost | Accuracy | |
|---|---|---|---|---|
| Mistral-7B | 7B | $873/mo | 94% | ✓ Selected |
| Llama-3B | 3B | $678/mo | 81% | Rejected |
| DistilBERT | 66M | $333/mo | 73% | Rejected |
The cheaper options introduced unacceptable accuracy losses on domain-specific language — technical job titles, framework names, seniority signals. The additional $195/month over Llama-3B delivered a 13-point accuracy improvement that paid for itself many times over in reduced false positives downstream.
2. Embedding Model
| Model | Dimensions | Cost / 1M tokens | |
|---|---|---|---|
| OpenAI text-embedding-3-large | 3,072 | $0.13 | ✓ Selected |
| Cohere embed-english-v3 | 1,024 | $0.10 | Rejected |
| Cohere embed-multilingual-v3 | 1,024 | $0.15 | Rejected |
| Google textembedding-gecko | 768 | $0.025 | Rejected |
At 3,072 dimensions, OpenAI’s model substantially outperformed all alternatives on professional terminology benchmarks — the nuance between “Staff Engineer” and “Principal Engineer”, or “React” versus “Next.js”, matters deeply in talent matching. 99.9% API uptime was also non-negotiable for production operations.
3. Database Architecture
The team evaluated purpose-built vector databases including Pinecone, Weaviate, and Qdrant. The operational overhead wasn’t justified at the scale required. PostgreSQL with pgvector provided a single system for both relational data and vector search — with IVFFlat indexing delivering sub-linear search performance across 1M+ candidate profiles. No additional infrastructure to secure, monitor, or maintain. This single decision saved an estimated $400–600/month in infrastructure cost and significant ongoing engineering complexity.
Results
| Metric | Before | After | Change |
|---|---|---|---|
| Screening time per role | 4–6 hours | 45–60 minutes | ↓ 85% |
| Candidate evaluation consistency | 60% agreement | 91% agreement | ↑ 52% |
| Time to shortlist | 2–3 days | 4–6 hours | ↓ 80% |
| False positive rate on shortlists | 35% | 12% | ↓ 66% |
| Cost per candidate scored | ~$3.20 (manual) | $0.02 | ↓ 99.4% |
| Monthly infrastructure spend | — | $400 | ROI: 340% in 6 mo |
| Job postings handled simultaneously | Baseline | 3× baseline | ↑ 200% |
Beyond the numbers: recruiters shifted focus from repetitive screening to high-value client relationship work. Non-obvious candidate matches — invisible to keyword search — surfaced consistently. And the clustering layer produced talent market reports as a new product line, with zero additional ML infrastructure required.
What We’d Tell Engineers Starting This
- Don’t over-engineer your vector store: Purpose-built vector databases add operational overhead that isn’t justified until you’re well past 10M+ vectors. PostgreSQL + pgvector handles millions of candidate profiles with IVFFlat indexing — and keeps your stack simple and your security surface small.
- Accuracy beats cost on foundation models: The temptation to use the cheapest text processing model is strong. Don’t. A 13-point accuracy drop on a $195/month saving creates far more expensive downstream problems — bad shortlists, recruiter rework, client trust damage. Measure total cost of ownership, not just API cost.
- PII handling is an architectural decision, not an afterthought: Bake obfuscation into the pipeline before embeddings — not after. Sensitive data never touches the vector store. Compliance is far easier to maintain than to retrofit, and far cheaper than a breach.
- Configurable weights unlock product flexibility: Different clients weight criteria differently. Making weights client-configurable turned the scoring engine from an internal tool into a product surface. Every new client configuration is a data point on what the market values.
- Clustering is a second product hiding in your embeddings: Once you have embeddings for thousands of candidates, K-means clustering surfaces talent pool intelligence at near-zero marginal cost. The ML infrastructure was already there — the team productized it as client-facing market reports.
- Design for augmentation, not automation: The system ranks candidates, but it doesn’t hire them. Recruiters still bring irreplaceable context about client culture, role nuance, and fit signals no embedding captures. Designing for augmentation rather than replacement drove adoption and trust.






