2M+ record search workflow
Large-Scale Search Platform
Search infrastructure that made 2M+ records usable through a WordPress front end.
Project summary
Built and maintained the backend pipeline that let a large client dataset stay searchable through WordPress without forcing the CMS to store or query everything directly.
A WordPress-facing search experience backed by Python, PostgreSQL, Elasticsearch, and queues.
Project summary
Dataset size
2M+ records
Frontend shell
WordPress
Built and maintained the backend pipeline that let a large client dataset stay searchable through WordPress without forcing the CMS to store or query everything directly.
Buyer-facing summary
Client problem
The product needed fast, reliable search and filtering across a large dataset.
What I delivered
I worked on backend search logic, API-driven data access, and production-ready implementation patterns for large records and user-facing queries.
Business result
The platform could support high-volume search workflows with a cleaner experience for users.
Problem
The client needed a very large dataset, well beyond two million records, to be searchable by end users through a WordPress front end.
Keeping that data inside WordPress would have made the CMS do the wrong job. The real problem was building a proper data pipeline and search stack while preserving WordPress as the public interface.
What I built
Separated data and presentation layers
Kept the dataset in PostgreSQL, indexed search in Elasticsearch, and let WordPress focus on presenting search results rather than storing or querying millions of records directly.
Python ingestion pipeline
Built batching, normalization, resume-after-failure behavior, and indexing flows that could handle ongoing imports without collapsing under volume.
RabbitMQ-based job orchestration
Moved large processing steps into queued workflows so re-indexing and heavy operations did not block the rest of the platform.
AWS-backed export handling
Implemented export flows that wrote large outputs to AWS rather than trying to generate everything synchronously inside a user request.
Technical decisions
Elasticsearch defaults are not enough once query patterns and data volume become real. Index mapping and query structure needed deliberate tuning around how people actually searched the data.
A meaningful share of the work was about reliability: batching, recovery after partial failure, and keeping the public search experience insulated from backend processing jobs.
Outcome
What I would improve
I would front-load more of the index-mapping and search-pattern analysis before the first rounds of reactive optimization.
The system worked out well, but some re-indexing work could have been avoided with earlier analysis of real usage patterns.
Tech stack
Next step
If you need similar work, let’s talk through the constraints first.
The useful part of a project like this usually starts before code: understanding what the CMS should own, what should live in a backend service, and where integrations or automation can stay maintainable.
Related work
Client platform
AudioMazes
Custom WordPress audio platform with persistent playback, gated access, and secure AWS-backed delivery.
Read nextClient plugin
Business Directory & CRM-Integrated Plugin
Custom WordPress directory plugin with CRM sync, workflow automation, and controlled data sourcing.
Read nextEmployer product team
Fayvo Social Platform
API and search engineering for a social product serving around 10,000 users.
Read next