Data Sources & Attribution

GeneE aggregates data from seven public databases. Proper attribution is provided below as required by each source's license.

NCBI Gene

Public Domain

The National Center for Biotechnology Information Gene database provides canonical gene records including symbols, names, aliases, chromosomal locations, genomic coordinates, and official gene summaries for all known human genes.

Data used by GeneE: Gene symbols, full names, aliases, chromosome locations, gene types, NCBI summaries, and cross-reference identifiers.

ncbi.nlm.nih.gov/gene →

UniProt (Swiss-Prot)

CC BY 4.0

UniProt is the universal protein resource. GeneE uses the manually reviewed Swiss-Prot subset for human proteins, which provides high-quality protein function descriptions, subcellular localization, tissue specificity, and disease associations.

Data used by GeneE: Protein function descriptions, subcellular localization, tissue specificity, disease associations, and protein keywords.

UniProt: the Universal Protein Knowledgebase in 2025. The UniProt Consortium. Nucleic Acids Research (2025). Licensed under Creative Commons Attribution 4.0 International (CC BY 4.0).

uniprot.org →

Gene Ontology (GO)

CC BY 4.0

The Gene Ontology provides a structured, controlled vocabulary of gene and protein functions. GO annotations link genes to molecular functions, biological processes, and cellular components with evidence codes indicating the type of supporting evidence.

Data used by GeneE: GO term annotations (molecular function, biological process, cellular component), evidence codes, and term hierarchy.

The Gene Ontology resource: enriching a GOld mine. Gene Ontology Consortium. Nucleic Acids Research (2021). Licensed under Creative Commons Attribution 4.0 International (CC BY 4.0).

geneontology.org →

PubTator 3.0 / PubMed

Public Domain

PubTator 3.0 provides pre-computed named entity recognition (NER) over PubMed abstracts, identifying genes, diseases, chemicals, and other biomedical entities. GeneE uses these mappings to find relevant publications for each gene and to build the RAG context for AI summaries.

Data used by GeneE: Gene-to-PMID mappings, PubMed abstract text, publication metadata (titles, journals, years), and entity co-occurrence data.

ncbi.nlm.nih.gov/research/pubtator3 →

Open Targets

Apache 2.0

Open Targets Platform integrates evidence from genetics, genomics, transcriptomics, drugs, and the literature to score gene-disease associations. GeneE uses it for disease association scores, known drug targets, and tissue expression data.

Data used by GeneE: Disease association scores, drug target information (drug names, mechanisms of action, clinical trial phases), and tissue-level RNA expression values.

Open Targets Platform. Licensed under Apache License 2.0.

platform.opentargets.org →

ClinVar

Public Domain

ClinVar is an NIH archive of germline variants and their relationships to human disease, aggregating submissions from clinical testing labs, expert panels, and curated databases. GeneE uses ClinVar to surface pathogenic and likely-pathogenic variants on each gene page alongside review-star quality signals.

Data used by GeneE: Variant HGVS names and coordinates, clinical significance (pathogenic / likely pathogenic / VUS / conflicting), review-star confidence, linked conditions, last-evaluated dates, and aggregate per-tier variant counts.

ncbi.nlm.nih.gov/clinvar →

gnomAD

CC BY 4.0

The Genome Aggregation Database (gnomAD) harmonizes exome and genome sequencing data from a broad spectrum of large-scale sequencing projects. GeneE uses gnomAD v4.1 constraint metrics — LOEUF and pLI — to indicate how tolerant each gene is to loss-of-function variation in the general population.

Data used by GeneE: Per-gene constraint metrics (LOEUF, pLI, observed/expected loss-of-function ratio and 90% CI) from the MANE Select transcript, including chrX and chrY backfilled via the gnomAD GraphQL API.

gnomAD Consortium. Karczewski, K.J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature (2020). gnomAD v4.1 released by the Broad Institute. Summary constraint metrics are distributed under Creative Commons Attribution 4.0 International (CC BY 4.0).

gnomad.broadinstitute.org →