HTAN Bulk RNA Sequencing Data Standard

Overview

This page describes the data levels, metadata attributes, and file structure for bulk RNA sequencing.

Description of Assay

Bulk RNA sequencing identifies the average gene expression profile of a biological sample.

Metadata Levels

The defined metadata leverages existing common data elements from the Genomic Data Commons (GDC). The HTAN data model currently supports Level 1, 2 and 3 RNA sequencing data:  

Level Number

Definition

Example Data

1

Unaligned reads

FASTQ

2

Aligned reads

BAM

3

Gene level expression, unnormalized

Gene & isoform expression-level data (.csv)

Data Schema:
Attribute
Label
Description
Bulk RNA-seq Level 1
BulkRNA-seqLevel1
Bulk RNA-seq [EFO_0003738]
Bulk RNA-seq Level 2
BulkRNA-seqLevel2
Bulk RNA-seq alignment protocol description
Bulk RNA-seq Level 3
BulkRNA-seqLevel3
Bulk RNA-seq gene expression matrices