使用集合映射和关联关系映射
Inter-conversion of gene ID’s is the most important aspect enabling genomic and proteomic data analysis. There are multiple tools available each with its own drawbacks. While performing enrichment analysis on Mass Spectrometry datasets, I had always struggled to prepare the input files required for each of the packages in R. It takes some data tweaking and cleanup to enable the R tools or packages to accept them as an input. The struggle is more in case of UniProt id’s as very few applications accept them as input. Although UniProt provides the retrieve id mapping function, it does not take into account the number of rows which means any protein or gene id which cannot be mapped is simply omitted from the output file. This makes combining the datasets difficult.
基因ID的相互转换是实现基因组和蛋白质组数据分析的最重要方面。 有多种可用的工具,每种工具都有其自身的缺点。 在对质谱数据集进行富集分析时,我一直在努力准备R中每个程序包所需的输入文件。需要进行一些数据调整和清理,以使R工具或程序包可以将它们作为输入来接受。 在UniProt id的情况下,斗争更加艰巨,因为很少有应用程序接受它们作为输入。 尽管UniProt提供了检索ID映射功能,但它没有考虑行数,这意味着从输出文件中会省略掉无法映射的任何蛋白质或基因ID。 这使得难以合并数据集。
There are numerous tools available for such kind of ID mapping. Here I am laying out a few R packages that I have used and worked smoothly.
有许多工具可用于此类ID映射。 在这里,我将介绍一些我使用和顺利工作过的R软件包。
AnnotationDbi package
AnnotationDbi包
The org.Hs.eg.db package or the org.Mm.eg.db package is to be used for human and mice respectively. mapIds can take any input form like UniProt id, HGNC symbol, Ensembl id and Entrez id and interconvert them.
org.Hs.eg.db软件包或org.Mm.eg.db软件包将分别用于人类和小鼠。 mapId可以采用任何输入形式,例如UniProt ID,HGNC符号,Ensembl ID和Entrez ID并相互转换。
library(‘org.Mm.eg.db’)ensembl<-mapIds(org.Mm.eg.db, keys=rownames(df), column=’ENSEMBL’, keytype=’SYMBOL’, multiVals=”first”)entrez<-mapIds(org.Mm.eg.db, keys=rownames(df), column=’ENTREZID’, keytype=’SYMBOL’, multiVals=”first”)entrez<-mapIds(org.Mm.eg.db, keys=rownames(df), column=’UNIPROT’, keytype=’SYMBOL’, multiVals=”first”)
mapIds()
returns a named vector of id’s.
mapIds()
返回id的命名向量。
The output can be merged to the original dataset using `cbind` for further downstream analysis. The one advantage that I have noticed with mapIds is that it matches the gene id’s row by row and inserts NA when it can’t find gene names or symbols for certain UniProt id’s. This is a huge lifesaver when working with huge datasets.
可以使用`cbind`将输出合并到原始数据集中以进行进一步的下游分析。 我用mapIds注意到的一个优点是,它与基因ID的行匹配,并且在找不到某些UniProt ID的基因名称或符号时插入NA。 当使用庞大的数据集时,这是一个巨大的救星。
2. biomaRt package
2.生物材料包装
require(biomaRt)mart<-useMart(biomart = “ensembl”, dataset = “mmusculus_gene_ensembl”)mart <- useDataset(dataset=”mmusculus_gene_ensembl”, mart=mart)mapping <- getBM(attributes=c(“mgi_symbol”,”ensembl_gene_id”,”entrezgene_id”), filters = “mgi_symbol”, mart=mart, values=data, uniqueRows=TRUE, bmHeader = T)
For human hgnc_symbol and for mouse mgi_symbol is to be used.
对于人类hgnc_symbol和对于小鼠, mgi_symbol将被使用。
Generally, with biomaRt, extra work is required after you perform the initial mapping. You will note that biomaRt does not even return the genes in the same order in which they were submitted.
通常,对于biomaRt ,执行初始映射后需要额外的工作。 您会注意到, biomaRt甚至不按提交基因的顺序返回基因。
3. bitr from ClusterProfiler package
3.从ClusterProfiler包中获取bitr
The ClusterProfiler package was developed by Guangchuang Yu for statistical analysis and visualization of functional profiles for genes and gene clusters. The org.Hs.eg.db or the org.Mm.eg.db package is to be used for human and mice respectively. The key types can be obtained by typing keytypes(org.Mm.eg.db)
.
ClusterProfiler软件包由Yu Guangchuang Yu开发,用于统计分析和可视化基因和基因簇的功能概况。 org.Hs.eg.db或org.Mm.eg.db包将分别用于人类和小鼠。 可以通过键入keytypes(org.Mm.eg.db)
获得密钥类型。
bitr(geneID, fromType, toType, OrgDb, drop = TRUE)ids <- bitr(data, fromType=”SYMBOL”, toType=c(“UNIPROT”, “ENSEMBL”, “ENTREZID”), OrgDb=”org.Mm.eg.db”)
Apart from the R functions listed above there are various tools for gene ID conversion like DAVID, UCSC gene ID converter etc. for non-programmers.
除了上面列出的R函数外,还有各种用于基因ID转换的工具,例如DAVID,UCSC基因ID转换器等,用于非编程人员。
翻译自: https://medium.com/computational-biology/gene-id-mapping-using-r-14ff50eec9ba
使用集合映射和关联关系映射
本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/392319.shtml
如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!