我有很多文件,如下面的目录“结果”
58052 results/TB1.genes.results
198003 results/TB1.isoforms.results
58052 results/TB2.genes.results
198003 results/TB2.isoforms.results
58052 results/TB3.genes.results
198003 results/TB3.isoforms.results
58052 results/TB4.genes.results
198003 results/TB4.isoforms.results
例如:TB1.genes.results文件如下所示:
gene_id transcript_id(s) length effective_length expected_count TPM FPKM
ENSG00000000003 ENST00000373020,ENST00000494424,ENST00000496771,ENST00000612152,ENST00000614008 2206.00 1997.20 1.00 0.00 0.01
ENSG00000000005 ENST00000373031,ENST00000485971 940.50 731.73 0.00 0.00 0.00
ENSG00000000419 ENST00000371582,ENST00000371584,ENST00000371588,ENST00000413082,ENST00000466152,ENST00000494752 977.15 768.35 1865.00 14.27 37.82
ENSG00000000457 ENST00000367770,ENST00000367771,ENST00000367772,ENST00000423670,ENST00000470238 3779.11 3570.31 1521.00 2.50 6.64
ENSG00000000460 ENST00000286031,ENST00000359326,ENST00000413811,ENST00000459772,ENST00000466580,ENST00000472795,ENST00000481744,ENST00000496973,ENST00000498289 1936.74 1727.94 1860.00 6.33 16.77
ENSG00000000938 ENST00000374003,ENST00000374004,ENST00000374005,ENST00000399173,ENST00000457296,ENST00000468038,ENST00000475472 2020.10 1811.30 6846.00 22.22 58.90
ENSG00000000971 ENST00000359637,ENST00000367429,ENST00000466229,ENST00000470918,ENST00000496761,ENST00000630130 2587.83 2379.04 0.00 0.00 0.00
ENSG00000001036 ENST00000002165,ENST00000367585,ENST00000451668 1912.64 1703.85 1358.00 4.69 12.42
ENSG00000001084 ENST00000229416,ENST00000504353,ENST00000504525,ENST00000505197,ENST00000505294,ENST00000509541,ENST00000510837,ENST00000513939,ENST00000514004,ENST00000514373,ENST00000514933,ENST00000515580,ENST00000616923 2333.50 2124.73 1178.00 3.26 8.64
其他文件也具有相同的列.要将所有“genes.results”与“gene_id”和“expected_count”列连接到一个文本文件中,我给出了以下命令.
paste results/*.genes.results | tail -n+2 | cut -f1,5,12,19,26 > final.genes.rsem.txt
[-f1 (gene_id), 5 (expected_count column from TB1.genes.results), 12 (expected_count column from TB2.genes.results),
19 (expected_count column from TB3.genes.results), 26 (expected_count column from TB4.genes.results)]
“final.genes.rsem.txt”有,从每个文件中选择了gene_id和expected_count列.
ENSG00000000003 1.00 0.00 3.00 2.00
ENSG00000000005 0.00 0.00 0.00 0.00
ENSG00000000419 1865.00 1951.00 5909.00 8163.00
ENSG00000000457 1521.00 1488.00 849.00 1400.00
ENSG00000000460 1860.00 1616.00 2577.00 2715.00
ENSG00000000938 6846.00 5298.00 1.00 2.00
ENSG00000000971 0.00 0.00 6159.00 7069.00
ENSG00000001036 1358.00 1186.00 6196.00 7009.00
ENSG00000001084 1178.00 1186.00 631.00 1293.00
我的问题是 – 由于我只有很少的样本,我在命令中给出了列号[就像在“cut”-f1,5,12,19,26中这样.如果我有超过100个样本我应该怎么做.如何使用必填列加入它们?