If you are familiar with python I'd use pandas. It uses "DataFrames" similarly to R, so you could take the concept and apply it to R.
Assuming your data1
is a delimited file formatted like this:
GeneName | ExpValue |
gene1 300.0
gene2 250.0
Then you can do this to get each data type into a DataFrame
:
dfblood = pd.read_csv('path/to/data1',delimiter='\t')
dftissue = pd.read_csv('path/to/data2',delimiter='\t')
dftumor = pd.read_csv('path/to/data3',delimiter='\t')
Now merge
the DataFrame's into one master df
.
dftmp = pd.merge(dfblood,dftissue,on='GeneName',how='inner')
df = pd.merge(dftmp,dftumor,on='GeneName',how='inner')
Rename your columns, be careful to ensure the proper order.
df.columns = ['GeneName','blood','tissue','tumor']
Now you can normalize your data (if it's not already) with easy commands.
df = df.set_index('GeneName') # allows you to perform computations on the entire dataset
df_norm = (df - df.mean()) / (df.max() - df.min())
You can all df_norm.corr()
to produce the results below. But at this point, you can use numpy to perform more complex calculations, if needed.
blood tissue tumor
blood 1.000000 0.395160 0.581629
tissue 0.395160 1.000000 0.840973
tumor 0.581629 0.840973 1.000000
HTH at least move in the right direction.
EDIT
If you want to use Student T's log-fold change you could calculate the log of the original data using numpy.log
import numpy as np
df[['blood','tissue','tumor']] = df[['blood','tissue','tumor']]+1
# +1 to avoid taking the log of 0
df_log = np.log(df[['blood','tissue','tumor']])
To get the 'log' fold change for each gene, this will append new columns to your df_log DataFrame.
df_log['logFCBloodTumor'] = df_log['blood'] - df_log['tumor']
df_log['logFCBloodTissue'] = df_log['blood'] - df_log['tissue']