← Back to search

About GeneE

An AI-native gene search platform — “Google for genes.”

What is GeneE?

GeneE combines authoritative gene data from public databases with AI-generated plain-language summaries, semantic search, and a modern interface. The goal is to make gene information accessible to everyone — from students encountering genetics for the first time to researchers looking for a quick, cited overview.

The platform currently covers approximately 20,000 human protein-coding genes. Each gene page aggregates identifiers, function annotations, disease associations, drug targets (with clinical phase and FDA status), tissue expression, related genes, pathogenic ClinVar variants, gnomAD constraint metrics, an embedded interactive 3D protein structure (experimental PDB when available, AlphaFold prediction otherwise), pre-computed functional and clinical badges, and an AI-written summary backed by PubMed citations.

How it works

1
Data aggregation
We pull structured gene data from 7 authoritative public databases — NCBI Gene, UniProt, Gene Ontology, PubTator 3.0, Open Targets, ClinVar, and gnomAD — and unify them under a single canonical identifier.
2
AI summarization with mandatory citations
For each gene, relevant PubMed abstracts are retrieved and fed to a large language model. Every factual claim in the resulting summary must cite a specific PMID. Summaries that fail automated citation validation are withheld.
3
Hybrid search
Queries are matched using both keyword search (Typesense) for exact gene symbols and semantic search (vector embeddings) for natural-language questions like "genes involved in DNA repair." Results are merged via Reciprocal Rank Fusion.
4
Weekly updates
A scheduled pipeline refreshes ClinVar pathogenic variants every Sunday and re-pulls Open Targets drugs + disease associations monthly. AI summaries are regenerated on demand when a new corpus snapshot warrants it.

Trust & accuracy

Every AI-generated summary undergoes automated validation before it is published. The validation pipeline checks that all cited PMIDs exist, cross-references disease and chromosome claims against structured data, and flags any unsupported statements. Summaries that do not pass validation are not displayed — users see the structured data instead.

Source relevance scores are computed from a weighted blend of semantic similarity, journal impact, publication recency, and whether the AI cited the source. Preprints are clearly flagged so users can weigh evidence appropriately.

Explore by dimension

Beyond searching for a specific gene, you can browse the full catalogue along 12 ranking dimensions — most researched, FDA-approved drug targets, most pathogenic variants, most constrained, and more — on the Gene Rankings page. Every sort-and-filter combination is a shareable URL.

A clickable chromosome ideogram on the home page and above the rankings filters lets you narrow the catalogue to a single chromosome in one click.

Limitations

Important
GeneE is an informational tool, not a diagnostic or clinical resource. AI-generated summaries should always be verified against primary sources. The current release covers human protein-coding genes only. Data freshness depends on source database update cycles and the weekly pipeline schedule.

Data sources

GeneE aggregates data from seven public databases. See the Data Sources & Attribution page for full details and license information.