Import QIIME2 results to TreeSummarizedExperiment

Results exported from QIMME2 can be imported as a TreeSummarizedExperiment using loadFromQIIME2. Except for the featureTableFile, the other data types, taxonomyTableFile, refSeqFile and phyTreeFile, are optional, but are highly encouraged to be provided.

Import the QIIME2 artifacts to R.

loadFromQIIME2(
  featureTableFile,
  taxonomyTableFile = NULL,
  sampleMetaFile = NULL,
  featureNamesAsRefSeq = TRUE,
  refSeqFile = NULL,
  phyTreeFile = NULL,
  ...
)

readQZA(file, temp = tempdir(), ...)

Arguments

featureTableFile

a single character value defining the file path of the feature table to be imported.

taxonomyTableFile

a single character value defining the file path of the taxonomy table to be imported. (default: taxonomyTableFile = NULL).

sampleMetaFile

a single character value defining the file path of the sample metadata to be imported. The file has to be in tsv format. (default: sampleMetaFile = NULL).

featureNamesAsRefSeq

TRUE or FALSE: Should the feature names of the feature table be regarded as reference sequences? This setting will be disregarded, if refSeqFile is not NULL. If the feature names do not contain valid DNA characters only, the reference sequences will not be set.

refSeqFile

a single character value defining the file path of the reference sequences for each feature. (default: refSeqFile = NULL).

phyTreeFile

a single character value defining the file path of the phylogenetic tree. (default: phyTreeFile = NULL).

...

additional arguments:

temp: the temporary directory used for decompressing the data. (default: tempdir())
removeTaxaPrefixes: TRUE or FALSE: Should taxonomic prefixes be removed? (default: removeTaxaPrefixes = FALSE)

file

character, path of the input qza file. Only files in format of BIOMV210DirFmt (feature table), TSVTaxonomyDirectoryFormat (taxonomic table), NewickDirectoryFormat (phylogenetic tree ) and DNASequencesDirectoryFormat (representative sequences) are supported right now.

temp

character, a temporary directory in which the qza file will be decompressed to, default tempdir().

Value

A TreeSummarizedExperiment

object

matrix object for feature table, DataFrame for taxonomic table, ape::phylo object for phylogenetic tree, Biostrings::DNAStringSet for representative sequences of taxa.

Details

Both arguments featureNamesAsRefSeq and refSeqFile can be used to define reference sequences of features. featureNamesAsRefSeq is only taken into account, if refSeqFile is NULL. No reference sequences are tried to be created, if featureNameAsRefSeq is FALSE and refSeqFile is NULL.

References

Bolyen E et al. 2019: Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nature Biotechnology 37: 852–857. https://doi.org/10.1038/s41587-019-0209-9

https://qiime2.org

Author

Yang Cao

Examples

featureTableFile <- system.file("extdata", "table.qza", package = "mia")
taxonomyTableFile <- system.file("extdata", "taxonomy.qza", package = "mia")
sampleMetaFile <- system.file("extdata", "sample-metadata.tsv", package = "mia")
phyTreeFile <- system.file("extdata", "tree.qza", package = "mia")
refSeqFile <- system.file("extdata", "refseq.qza", package = "mia")
tse <- loadFromQIIME2(
  featureTableFile = featureTableFile,
  taxonomyTableFile = taxonomyTableFile,
  sampleMetaFile = sampleMetaFile,
  refSeqFile = refSeqFile,
  phyTreeFile = phyTreeFile
)

tse
#> class: TreeSummarizedExperiment 
#> dim: 770 34 
#> metadata(0):
#> assays(1): counts
#> rownames(770): 4b5eeb300368260019c1fbc7a3c718fc
#>   fe30ff0f71a38a39cf1717ec2be3a2fc ... 98d250a339a635f20e26397dafc6ced3
#>   1830c14ead81ad012f1db0e12f8ab6a4
#> rowData names(8): Kingdom Phylum ... Species Confidence
#> colnames(34): L1S8 L1S57 ... L6S68 L6S93
#> colData names(9): sample.id barcode.sequence ...
#>   reported.antibiotic.usage days.since.experiment.start
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
#> rowLinks: a LinkDataFrame (770 rows)
#> rowTree: 1 phylo tree(s) (770 leaves)
#> colLinks: NULL
#> colTree: NULL
#> referenceSeq: a DNAStringSet (770 sequences)
# Read individual files
featureTableFile <- system.file("extdata", "table.qza", package = "mia")
taxonomyTableFile <- system.file("extdata", "taxonomy.qza", package = "mia")
sampleMetaFile <- system.file("extdata", "sample-metadata.tsv", package = "mia")

assay <- readQZA(featureTableFile)
rowdata <- readQZA(taxonomyTableFile, removeTaxaPrefixes = TRUE)
coldata <- read.table(sampleMetaFile, header = TRUE, sep = "\t", comment.char = "")

# Assign rownames 
rownames(coldata) <- coldata[, 1]
coldata[, 1] <- NULL

# Order coldata based on assay
coldata <- coldata[match(colnames(assay), rownames(coldata)), ]

# Create SE from individual files
se <- SummarizedExperiment(assays = list(assay), rowData = rowdata, colData = coldata)
se
#> class: SummarizedExperiment 
#> dim: 770 34 
#> metadata(0):
#> assays(1): ''
#> rownames(770): 4b5eeb300368260019c1fbc7a3c718fc
#>   fe30ff0f71a38a39cf1717ec2be3a2fc ... 98d250a339a635f20e26397dafc6ced3
#>   1830c14ead81ad012f1db0e12f8ab6a4
#> rowData names(8): Kingdom Phylum ... Species Confidence
#> colnames(34): L1S105 L1S140 ... L6S68 L6S93
#> colData names(8): barcode.sequence body.site ...
#>   reported.antibiotic.usage days.since.experiment.start

Import QIIME2 results to `TreeSummarizedExperiment`