Correctness of Cell Labels in Public Single Cell Transcriptomics Datasets

Xin Lin, Minjie Lyu, Yihan Zhang, Derin B. Keskin, Lou T. Chitkushev, Guanglan Zhang, Vladimir Brusic

Research output: Chapter in Book/Conference proceedingConference contributionpeer-review

Abstract

The number of single-cell transcriptomic (SCT) studies is rapidly increasing. More than 15000 single cell gene expression data sets are available in public repositories. More than 2400 of these sets involve Peripheral Blood Mononuclear Cells (PBMC) data sets. Main cell types of PBMC are B cells, dendritic cells, monocytes, natural killer cells, and T cells. Labels of individual PBMC are usually provided in metadata accompanying the data sets or are implicit as data set partitions for sorted cells. We analyzed the correctness of labels assigned to individual cells from PBMC in primary reports. The correctness of primary labels was assessed by using Artificial Neural Network (ANN) classifier and Confident Learning (CL) approach. We assessed that the number of mislabels on average in our data sets is about2%. The label accuracy varied broadly between data sets, particularly among those generated by experimental cell sorting followed by SCT.

Original languageEnglish
Title of host publicationProceedings - 2021 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2021
EditorsYufei Huang, Lukasz Kurgan, Feng Luo, Xiaohua Tony Hu, Yidong Chen, Edward Dougherty, Andrzej Kloczkowski, Yaohang Li
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages3280-3284
Number of pages5
ISBN (Electronic)9781665401265
DOIs
Publication statusPublished - 2021
Event2021 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2021 - Virtual, Online, United States
Duration: 9 Dec 202112 Dec 2021

Publication series

NameProceedings - 2021 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2021

Conference

Conference2021 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2021
Country/TerritoryUnited States
CityVirtual, Online
Period9/12/2112/12/21

Keywords

  • ANN
  • PBMC
  • confident learning
  • gene expression
  • mislabels analysis
  • supervised machine learning

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Science Applications
  • Biomedical Engineering
  • Health Informatics
  • Information Systems and Management

Fingerprint

Dive into the research topics of 'Correctness of Cell Labels in Public Single Cell Transcriptomics Datasets'. Together they form a unique fingerprint.

Cite this