- 浏览: 230240 次
- 性别:
- 来自: 天朝帝都
文章分类
最新评论
-
hanmiao:
CSDN 博客地址是这個?http://blog.csdn.n ...
将博客搬至CSDN -
chenwq:
下载了,谢谢分享!
R语言学习入门 -
bbsunchen:
今天跟英姐聊天,她verbal考了151,不够啊,数学也不高。 ...
跟我一起考GRE(三) -
bbsunchen:
qinger说得对我今年只做三件事情:考好GRE,考好TOEF ...
IT行业成功必备的素质 -
bbsunchen:
还有8天就考试了,哥还在过单词啊
跟我一起考GRE(三)
介绍如何使用 R 绘制 heatmap 的文章。
今天无意间在Flowingdata看到一篇关于如何使用 R 来做 heatmap 的文章(请移步到这里)。虽然 heatmap 只是 R 中一个很普通的图形函数,但这个例子使用了2008-2009赛季 NBA 50个顶级球员数据做了一个极佳的演示,效果非常不错。对 R 大致了解的童鞋可以直接在 R console 上敲
?heatmap
直接查看帮助即可。
没有接触过 R 的童鞋继续围观,下面会仔细介绍如何使用 R 实现 NBA 50位顶级球员指标表现热图:
关于 heatmap,中文一般翻译为“热图”,其统计意义wiki上解释的很清楚:
A heat map is a graphical representation of data where the values taken by a variable in a two-dimensional map are represented as colors.Heat maps originated in 2D displays of the values in a data matrix. Larger values were represented by small dark gray or black squares (pixels) and smaller values by lighter squares.
下面这个图即是Flowingdata用一些 R 函数对2008-2009 赛季NBA 50名顶级球员指标做的一个热图(点击参看大图):
先解释一下数据:
这里共列举了50位球员,估计爱好篮球的童鞋对上图右边的每个名字都会耳熟能详。这些球员每个人会有19个指标,包括打了几场球(G)、上场几分钟(MIN)、得分(PTS)……这样就行成了一个50行×19列的矩阵。但问题是,数据有些多,需要使用一种比较好的办法来展示,So it comes, heatmap!
简单的说明:
比如从上面的热图上观察得分前3名(Wade、James、Bryant)PTS、FGM、FGA比较高,但Bryant的FTM、FTA和前两者就差一些;Wade在这三人中STL是佼佼者;而James的DRB和TRB又比其他两人好一些……
姚明的3PP(3 Points Percentage)这条数据很有意思,非常出色!仔细查了一下这个数值,居然是100%。仔细回想一下,似乎那个赛季姚明好像投过一个3分,并且中了,然后再也没有3p。这样本可真够小的!
最后是如何做这个热图(做了些许修改):
Step 0. Download R
R 官网:http://www.r-project.org,它是免费的。官网上面提供了Windows,Mac,Linux版本(或源代码)的R程序。
Step 1. Load the data
R 可以支持网络路径,使用读取csv文件的函数read.csv。
读取数据就这么简单:
nba<- read.csv("http://datasets.flowingdata.com/ppg2008.csv", sep=",")
Step 2. Sort data
按照球员得分,将球员从小到大排序:
nba <- nba[order(nba$PTS),]
当然也可以选择MIN,BLK,STL之类指标
Step 3. Prepare data
把行号换成行名(球员名称):
row.names(nba) <- nba$Name
去掉第一列行号:
nba <- nba[,2:20] # or nba <- nba[,-1]
Step 4. Prepare data, again
把 data frame 转化为我们需要的矩阵格式:
nba_matrix <- data.matrix(nba)
Step 5. Make a heatmap
# R 的默认还会在图的左边和上边绘制 dendrogram,使用Rowv=NA, Colv=NA去掉
heatmap(nba_matrix, Rowv=NA, Colv=NA, col=cm.colors(256), revC=FALSE, scale='column')
这样就得到了上面的那张热图。
Step 6. Color selection
或者想把热图中的颜色换一下:
heatmap(nba_matrix, Rowv=NA, Colv=NA, col=heat.colors(256), revC=FALSE, scale="column", margins=c(5,10))
Bioinformatics and Computational Biology Solutions Using R and Bioconductor 第10章的
例子:
Heatmaps, or false color images have a reasonably long history, as has the notion of rearranging the columns and rows to show structure in the data. They were applied to microarray data by Eisen et al. (1998) and have become a standard visualization method for this type of data. A heatmap is a two-dimensional, rectangular, colored grid. It displays data that themselves come in the form of a rectangular matrix. The color of each rectangle is determined by the value of the corresponding entry in the matrix. The rows and columns of the matrix can be rearranged independently. Usually they are reordered so that similar rows are placed next to each other, and the same for columns. Among the orderings that are widely used are those derived from a hierarchical clustering, but many other orderings are possible. If hierarchical clustering is used, then it is customary that the dendrograms are provided as well. In many cases the resulting image has rectangular regions that are relatively homogeneous and hence the graphic can aid in determining which rows (generally the genes) have similar expression values within which subgroups of samples (generally the columns). The function heatmap is an implementation with many options. In particular, users can control the ordering of rows and columns independently from each other. They can use row and column labels of their own choosing or select their own color scheme.
> library("ALL")
> data("ALL")
> selSamples <- ALL$mol.biol %in% c("ALL1/AF4",
+ "E2A/PBX1")
> ALLs <- ALL[, selSamples]
> ALLs$mol.biol <- factor(ALLs$mol.biol)
> colnames(exprs(ALLs)) <- paste(ALLs$mol.biol,
+ colnames(exprs(ALLs)))
>library("genefilter")
> meanThr <- log2(100)
> g <- ALLs$mol.biol
> s1 <- rowMeans(exprs(ALLs)[, g == levels(g)[1]]) >
+ meanThr
> s2 <- rowMeans(exprs(ALLs)[, g == levels(g)[2]]) >
+ meanThr
> s3 <- rowttests(ALLs, g)$p.value < 2e-04
> selProbes <- (s1 | s2) & s3
> ALLhm <- ALLs[selProbes, ]
>library(RColorBrewer)
> hmcol <- colorRampPalette(brewer.pal(10, "RdBu"))(256)
> spcol <- ifelse(ALLhm$mol.biol == "ALL1/AF4",
+ "goldenrod", "skyblue")
> heatmap(exprs(ALLhm), col = hmcol, ColSideColors = spcol)
最后,可以
>help(heatmap) 查找帮助,看看帮助给提供的例子
也可以看这的例子:
http://www2.warwick.ac.uk/fac/sci/moac/students/peter_cock/r/heatmap/
Using R to draw a Heatmap from Microarray Data |
|
[c] The first section of this page uses R to analyse an Acute lymphocytic leukemia (ALL) microarray dataset, producing a heatmap (with dendrograms) of genes differentially expressed between two types of leukemia. There is a follow on page dealing with how to do this from Python using RPy. The original citation for the raw data is "Gene expression profile of adult T-cell acute lymphocytic leukemia identifies distinct subsets of patients with different response to therapy and survival" by Chiaretti et al. Blood 2004. (PMID: 14684422) The analysis is a "step by step" recipe based on this paper, Bioconductor: open software development for computational biology and bioinformatics, Gentleman et al. 2004. Their Figure 2 Heatmap, which we recreate, is reproduced here: Heatmaps from R Assuming you have a recent version of R (from The R Project) and BioConductor (see Windows XP installation instructions), the example dataset can be downloaded as the BioConductor ALL package. You should be able to install this from within R as follows: > source("http://www.bioconductor.org/biocLite.R") > biocLite("ALL") Running bioCLite version 0.1 with R version 2.1.1 ... Alternatively, you can download the package by hand from here or here. If you are using Windows, download ALL_1.0.2.zip (or similar) and save it. Then from within the R program, use the menu option "Packages", "Install package(s) from local zip files..." and select the ZIP file. On Linux, download ALL_1.0.2.tar.gz (or similar) and use sudo R CMD INSTALL ALL_1.0.2.tar.gz at the command prompt. With that out of the way, you should be able to start R and load this package with the library and data commands:> library("ALL") Loading required package: Biobase Loading required package: tools Welcome to Bioconductor Vignettes contain introductory material. To view, simply type: openVignette() For details on reading vignettes, see the openVignette help page. > data("ALL") If you inspect the resulting ALL variable, it contains 128 samples with 12625 genes, and associated phenotypic data. > ALL Expression Set (exprSet) with 12625 genes 128 samples phenoData object with 21 variables and 128 cases varLabels cod: Patient ID diagnosis: Date of diagnosis sex: Gender of the patient age: Age of the patient at entry BT: does the patient have B-cell or T-cell ALL remission: Complete remission(CR), refractory(REF) or NA. Derived from CR CR: Original remisson data date.cr: Date complete remission if achieved t(4;11): did the patient have t(4;11) translocation. Derived from citog t(9;22): did the patient have t(9;22) translocation. Derived from citog cyto.normal: Was cytogenetic test normal? Derived from citog citog: original citogenetics data, deletions or t(4;11), t(9;22) status mol.biol: molecular biology fusion protein: which of p190, p210 or p190/210 for bcr/able mdr: multi-drug resistant kinet: ploidy: either diploid or hyperd. ccr: Continuous complete remission? Derived from f.u relapse: Relapse? Derived from f.u transplant: did the patient receive a bone marrow transplant? Derived from f.u f.u: follow up data available date last seen: date patient was last seen We can looks at the results of molecular biology testing for the 128 samples: > ALL$mol.biol [1] BCR/ABL NEG BCR/ABL ALL1/AF4 NEG NEG NEG NEG NEG [10] BCR/ABL BCR/ABL NEG E2A/PBX1 NEG BCR/ABL NEG BCR/ABL BCR/ABL [19] BCR/ABL BCR/ABL NEG BCR/ABL BCR/ABL NEG ALL1/AF4 BCR/ABL ALL1/AF4 ... [127] NEG NEG Levels: ALL1/AF4 BCR/ABL E2A/PBX1 NEG NUP-98 p15/p16 Ignoring the samples which came back negative on this test (NEG), most have been classified as having a translocation between chromosomes 9 and 22 (BCR/ABL), or a translocation between chromosomes 4 and 11 (ALL1/AF4). For the purposes of this example, we are only interested in these two subgroups, so we will create a filtered version of the dataset using this as a selection criteria: > eset <- ALL[, ALL$mol.biol %in% c("BCR/ABL", "ALL1/AF4")] The resulting variable, eset, contains just 47 samples - each with the full 12,625 gene expression levels. This is far too much data to draw a heatmap with, but we can do one for the first 100 genes as follows: > heatmap(exprs(eset[1:100,])) According to the BioConductor paper we are following, the next step in the analysis was to use the lmFit function (from the limma package) to look for genes differentially expressed between the two groups. The fitted model object is further processed by the eBayes function to produce empirical Bayes test statistics for each gene, including moderated t-statistics, p-values and log-odds of differential expression. > library(limma) > f <- factor(as.character(eset$mol.biol)) > design <- model.matrix(~f) > fit <- eBayes(lmFit(eset,design)) If the limma package isn't installed, you'll need to install it first: > source("http://www.bioconductor.org/biocLite.R") > biocLite("limma") Running bioCLite version 0.1 with R version 2.1.1 ... We can now reproduce Figure 1 from the paper. > topTable(fit, coef=2) ID M A t P.Value B 1016 1914_at -3.076231 4.611284 -27.49860 5.887581e-27 56.32653 7884 37809_at -3.971906 4.864721 -19.75478 1.304570e-20 44.23832 6939 36873_at -3.391662 4.284529 -19.61497 1.768670e-20 43.97298 10865 40763_at -3.086992 3.474092 -17.00739 7.188381e-18 38.64615 4250 34210_at 3.618194 8.438482 15.45655 3.545401e-16 35.10692 11556 41448_at -2.500488 3.733012 -14.83924 1.802456e-15 33.61391 3389 33358_at -2.269730 5.191015 -12.96398 3.329289e-13 28.76471 8054 37978_at -1.036051 6.937965 -10.48777 6.468996e-10 21.60216 10579 40480_s_at 1.844998 7.826900 10.38214 9.092033e-10 21.27732 330 1307_at 1.583904 4.638885 10.25731 1.361875e-09 20.89145 The leftmost numbers are row indices, ID is the Affymetrix HGU95av2 accession number, M is the log ratio of expression, A is the log average expression, t the moderated t-statistic, and B is the log odds of differential expression. Next, we select those genes that have adjusted p-values below 0.05, using a very stringent Holm method to select a small number (165) of genes. > selected <- p.adjust(fit$p.value[, 2]) <0.05 > esetSel <- eset [selected, ] The variable esetSel has data on (only) 165 genes for all 47 samples . We can easily produce a heatmap as follows (in R-2.1.1 this defaults to a yellow/red "heat" colour scheme): > heatmap(exprs(esetSel))If you have the topographical colours installed (yellow-green-blue), you can do this: > heatmap(exprs(esetSel), col=topo.colors(100)) This is getting very close to Gentleman et al.'s Figure 2, except they have added a red/blue banner across the top to really emphasize how the hierarchical clustering has correctly split the data into the two groups (10 and 37 patients). To do that, we can use the heatmap function's optional argument of ColSideColors. I created a small function to map the eselSet$mol.biol values to red (#FF0000) and blue (#0000FF), which we can apply to each of the molecular biology results to get a matching list of colours for our columns: > color.map <- function(mol.biol) { if (mol.biol=="ALL1/AF4") "#FF0000" else "#0000FF" } > patientcolors <- unlist(lapply(esetSel$mol.bio, color.map)) > heatmap(exprs(esetSel), col=topo.colors(100), ColSideColors=patientcolors) Looks pretty close now, doesn't it: To recap, this is "all" we needed to type into R to achieve this: library("ALL") data("ALL") eset <- ALL[, ALL$mol.biol %in% c("BCR/ABL", "ALL1/AF4")] library("limma") f <- factor(as.character(eset$mol.biol)) design <- model.matrix(~f) fit <- eBayes(lmFit(eset,design)) selected <- p.adjust(fit$p.value[, 2]) <0.05 esetSel <- eset [selected, ] color.map <- function(mol.biol) { if (mol.biol=="ALL1/AF4") "#FF0000" else "#0000FF" } patientcolors <- unlist(lapply(esetSel$mol.bio, color.map)) heatmap(exprs(esetSel), col=topo.colors(100), ColSideColors=patientcolors) Heatmaps in R - More Options One subtle point in the previous examples is that the heatmap function has automatically scaled the colours for each row (i.e. each gene has been individually normalised across patients). This can be disabled using scale="none", which you might want to do if you have already done your own normalisation (or this may not be appropriate for your data): heatmap(exprs(esetSel), col=topo.colors(75), scale="none", ColSideColors=patientcolors, cexRow=0.5) You might also have noticed in the above snippet, that I have shrunk the row captions which were so big they overlapped each other. The relevant options are cexRow and cexCol. So far so good - but what if you wanted a key to the colours shown? The heatmap function doesn't offer this, but the good news is that heatmap.2 from the gplots library does. In fact, it offers a lot of other features, many of which I deliberately turn off in the following example: library("gplots") heatmap.2(exprs(esetSel), col=topo.colors(75), scale="none", ColSideColors=patientcolors, key=TRUE, symkey=FALSE, density.info="none", trace="none", cexRow=0.5) By default, heatmap.2 will also show a trace on each data point (removed this with trace="none"). If you ask for a key (using key=TRUE) this function will actually give you a combined "color key and histogram", but that can be overridden (with density.info="none"). Don't like the colour scheme? Try using the functions bluered/redblue for a red-white-blue spread, or redgreen/greenred for the red-black-green colour scheme often used with two-colour microarrays: library("gplots") heatmap.2(exprs(esetSel), col=redgreen(75), scale="row", ColSideColors=patientcolors, key=TRUE, symkey=FALSE, density.info="none", trace="none", cexRow=0.5) Heatmaps from Python So, how can we do that from within Python? One way is using RPy (R from Python), and this is discussed on this page. P.S. If you want to use heatmap.2 from within python using RPy, use the syntax heatmap_2 due to the differences in how R and Python handle full stops and underscores. What about other microarray data? Well, I have also documented how you can load NCBI GEO SOFT files into R as a BioConductor expression set object. As long as you can get your data into R as a matrix or data frame, converting it into an exprSet shouldn't be too hard. |
发表评论
-
USACO Mother's Milk(milk3)题解
2014-12-05 11:31 1018每种状态下,只有六种移动的情况:A->B, A-> ... -
USACO Ski Course Design(skidesign) 题解
2014-11-27 00:44 2313这一题被我想复杂了,我自己加了一个约束条件,可是题目中并没有 ... -
USACO Wormholes(wormhole) 题解
2014-11-25 05:13 2080这里再次强烈推荐USACO,因为他们每一题的题解现在有视频了 ... -
USACO Combination Lock (combo)题解
2014-11-05 23:41 1042有人说这一题暴力搜索,O(N^3),naive,哥只要O( ... -
用标准库或者boost分割C++字符串
2014-09-30 04:41 1170使用标准库 #include <iostream& ... -
Qt 使用Dom 操作XML的范例
2013-05-09 13:41 1836//--------------------读------- ... -
USACO Prime Cryptarithm 题解
2013-05-08 15:53 1457此题很水,暴力搜索,然后根据条件判断,continue掉一些 ... -
USACO Calf Flac题解
2013-05-06 17:21 1127这一题有点烦 我一开始的思路是,回文序列么,就是正序字符串 ... -
USACO Barn Repair 题解
2013-04-26 19:22 1274题目翻译还是看USACO吧, 这题贪心,贪心都是很水的,还 ... -
USACO Mixing Milk 题解
2013-04-26 11:05 1403题目大意: 描述 由于乳制品产业利润很低, ... -
稍微说说动态规划和贪心
2013-04-25 10:16 942最近在做USACO的training,又碰到了gre ... -
USACO Dual Palindromes 题解
2013-04-23 14:41 1070一天水了两题,呵呵,题目确实挺水的 做完上一题,这一题已经 ... -
USACO Palindromic Squares 源码
2013-04-23 14:02 913这题很容易的,依然是暴力搜索,好吧,这一章都叫complet ... -
USACO Name That Number源码
2013-04-08 15:26 932这次一直没过的是,C++的char转int, 要用这一句 ... -
USACO Transformation源码
2013-04-03 20:59 714/* ID: bbsunch2 PROG: trans ... -
USACO MilkingCow源码
2013-04-03 20:57 839/* ID: bbsunch2 PROG: milk2 ... -
pan-genome analysis sample code
2012-12-25 12:50 1062This is a C++ demo for pan-geno ... -
绦虫基因组研究方法
2012-12-21 21:21 949今天跟一个同学讨论了绦虫基因组研究方法,同时我也看到一些同学在 ... -
PyDev,在Eclipse中运行python
2012-04-20 10:38 2376最近学python做高精度运算。 虽然网上有很多高精度运算的 ... -
并行计算的强大
2012-04-17 10:36 1538最近在处理一批数据,10的8次方,处理完毕大概要一个月,并且这 ...
相关推荐
heatmap3包是R语言中用于绘制热图的一个工具包,它在传统的heatmap函数基础上增加了新的特性和便利性。 heatmap3包具有以下特点: 1. 兼容性:heatmap3完全兼容R语言中原始的heatmap函数,这意味着它可以无缝替换...
R语言提供了绘制热图的功能,本文将详细介绍如何使用R绘制热图。 title: 用R绘制热图 描述:用R绘制热图是指使用R语言创建热图的过程。热图是一种visualization工具,用于展示大规模数据之间的关系和模式。通过...
R语言凭借其强大的统计分析和图形绘制能力,成为了绘制热图的首选工具。这个名为“R语言绘制SCI科研热图源代码”的压缩包文件,提供了一个方便快捷的方法来生成科研级别的热图。 首先,我们需要理解R语言的基本概念...
在本资源中,"R语言绘制SCI科研临床相关性热图源代码.zip" 提供了一种使用R语言创建科学出版物(SCI)级别的临床相关性热图的方法。这个压缩包适用于已经熟悉R语言的用户,它包含了一个示例脚本或代码,用于生成热图...
在R语言中,绘制热图最常用的包是`heatmap`和`pheatmap`。本教程将重点介绍使用`pheatmap`包进行热图绘制,因为它提供了更直观且易于自定义的选项。 首先,确保你的R环境已经安装了`pheatmap`包。如果没有,可以...
heatmap.js不仅能在平面上绘制热图,还能与地图API结合,如Google Maps、Leaflet或本例中提到的百度地图。通过设置地图容器为heatmap实例的容器,可以将热图叠加在地图上,展示地理空间数据的分布。 ```javascript ...
首先,R语言的基础包`base R`提供了一个名为`pairs`的函数,它可以用于绘制相关矩阵的热图。例如,你可以自定义面板函数来改变默认的显示方式。`panel.cor`函数用于显示两两变量之间的相关系数,字号大小与相关系数...
在MATLAB中,我们可以使用heatmap函数来创建热图,这个函数提供了丰富的选项来自定义热图的外观和行为。本文将详细介绍如何在MATLAB中绘制热图,包括基本的绘图命令、自定义图形的样式和属性,以及一些高级的热图...
我(以及其他一些人的帮助)创建了一个易于使用的Java类,该类在JPanel中绘制热图。 添加到您自己的另一个项目中应该很容易。 X坐标和Y坐标的范围仅在在轴上绘制标签时使用,并且不影响所绘制数据的范围。 绘制阵列...
- **R语言中的热图绘制:** 在R语言中,可以使用`image()`或`heatmap()`等函数来绘制热图。 **代码示例与解析:** ```r # 加载数据 data(iris) mat (iris[, 1:4]) # 绘制热图 image(mat, col = heat.colors(12), ...
heatmap illustrator 是一款专业的绘制热图软件,所绘制的热图质量高,清晰。
适用于在origin中一键绘制聚类热图
批量给图像画注意力热图
可以用于金融、股票房地产等所有需要应用聚类分析的工作
【Matplotlib调用imshow()函数绘制热图1】 在Python的数据可视化领域,Matplotlib是一个不可或缺的库,它提供了丰富的绘图功能,包括2D和3D图表。在本篇文章中,我们将聚焦于如何使用imshow()函数来绘制热图,这是...
在Python中,绘制热力图heatmap是一种常用的可视化方法,特别是在数据科学、机器学习和统计分析中,热力图能够以直观的方式展示数据集中各变量之间的相关性或关系强度。热力图通过不同颜色的方块来表示数据值的大小...
=============== heatmap3D 将2D热图转换为3D模型可观察到X,Y,Z三个方向的对象的热图。参考建设者var j_heatmap3d = new heatmap3d . create ( ) ;创建2D热图画布j_heatmap3d . createHeatmap2dCanvases ( width ,...
在 MATLAB 开发环境中,`DrawHeatmap(X,Y,Z)` 是一个自定义函数,用于根据给定的 (X,Y) 坐标及其对应的值 Z 绘制二维热图。这个功能扩展了 MATLAB 内置的可视化能力,为数据可视化的用户提供了一个更具体的工具。在...
Django-Plotly-IMDB-Heatmap 网站演示 关联: <<<<<<<头警告:最多可能需要30秒才能加载! (由于Heroku的免费套餐而受到限制) 警告1:最多可能需要90秒才能载入! (由于Heroku的免费套餐...
例如,在绘制热图时,我们可以使用双线性插值来填充数据间的空缺区域,使图像更加平滑,同时保持原有的数据趋势。 对于红外热成像,这是一种非接触式的检测技术,通过捕捉物体发出的红外辐射来形成温度分布图。这种...