Fang L, Salami MO, Weber GM, Torvik VI. uCite: The union of nine large-scale public PubMed citation datasets with reliability filtering.
Data Brief 2025;
60:111535. [PMID:
40322502 PMCID:
PMC12049819 DOI:
10.1016/j.dib.2025.111535]
[Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2025] [Revised: 02/28/2025] [Accepted: 03/28/2025] [Indexed: 05/08/2025] Open
Abstract
There has been a recent push to make public, aggregate, and increase coverage of bibliographic citation data. Here we describe uCite, a citation dataset containing 564 million PubMed citation pairs aggregated from the following nine sources: PubMed Central, iCite, OpenCitations, Dimensions, Microsoft Academic Graph, Aminer, Semantic Scholar, Lens, and OpCitance. Of these, 51 million (9%) were labeled unreliable, as determined by patterns of source discrepancies explained by ambiguous metadata, crosswalk, and typographical errors, citing future publications, and multi-paper documents. Each source contributes to improved coverage and reliability, but varies dramatically in precision and recall, estimates of which are contrasted with the Web of Science and Scopus herein.
Collapse