HTAN Single Cell/Single Nucleus RNA Sequencing Data Standard

Overview

This page describes the data levels, metadata attributes, and file structure for single cell and single nucleus RNA sequencing assays.

Description of Assay

Single cell RNA sequencing is an emerging technology used to investigate the expression profiles of individual cells and/or nuclei. This technique is becoming increasingly useful for investigating the tumor microenvironment, which is composed of a heterogeneous population of cancer cells and tumor-adjacent stromal cells. In these experiments, tissues are enzymatically dissociated, and individual cells are isolated via microfluidics using oil droplet emulsion. Similarly to bulk RNA sequencing, individual transcriptomes are then uniquely tagged, reversed transcribed, amplified and sequenced. While sc-RNA sequencing captures both cytoplasmic and nuclear transcripts, single nucleus RNA sequencing measures the transcriptome of individual nuclei. Advantages of sn-RNA sequencing include differentiating cell states and identifying rare or novel cell types in heterogeneous populations.

Metadata Levels

In alignment with The Cancer Genome Atlas & NCI Genomic Data Commons, data are divided into levels:

Level Number

Definition

Example Data

1

Raw data

FASTQs, unaligned BAMs

2

Aligned primary data

Aligned BAMs

3

Derived biomolecular data

Gene expression matrix files, VCFs

4

Sample level summary data

t-SNE plot coordinates

Data Schema:
Attribute
Label
Description
scRNA-seq Level 1
ScRNA-seqLevel1
Single-cell RNA-seq [EFO_0008913]
scRNA-seq Level 2
ScRNA-seqLevel2
Alignment workflows downstream of scRNA-seq Level 1
scRNA-seq Level 3
ScRNA-seqLevel3
Gene and Isoform expression files
scRNA-seq Level 4
ScRNA-seqLevel4
Data represents the relationships between cells derived from Level 3 expression data and shown as tSNE or UMAP coordinates per cell, plus all other cell-specific meta information (e.g., cell type)