Hi @ebecht and @FPetitprez,
I've noticed some inconsistency between the code and the paper and I'm wondering what was the original intention.
The paper states:
Given a set of transcriptomic markers of a given category, we computed a corresponding per-sample score, called hereafter a MCP-counter score, using the log2 geometric mean of this set of markers.
While the implementation just calculates an arithmetic mean:
|
apply(xp[intersect(row.names(xp),x),,drop=F],2,mean,na.rm=T) |
Now if the input data were log2-transformed it would be somewhat like the geometric mean, but also not precisely, because the geometric mean would require an exp() after the arithmetic means of the logarithm.
geomean = exp(mean(log(X)))
I'm mostly asking because in immunedeconv we recommend the users to specify raw TPM and forward them to MCPcounter unchanged, because I was assuming that it calculates a geometric mean internally. However, given the actual implementation I think it would be more appropriate to log1p transform TPM values first to not give disproportional weight to more highly expressed genes.
Hi @ebecht and @FPetitprez,
I've noticed some inconsistency between the code and the paper and I'm wondering what was the original intention.
The paper states:
While the implementation just calculates an arithmetic mean:
MCPcounter/Source/R/MCPcounter.R
Line 14 in b6eac73
Now if the input data were log2-transformed it would be somewhat like the geometric mean, but also not precisely, because the geometric mean would require an
exp()after the arithmetic means of the logarithm.I'm mostly asking because in
immunedeconvwe recommend the users to specify raw TPM and forward them to MCPcounter unchanged, because I was assuming that it calculates a geometric mean internally. However, given the actual implementation I think it would be more appropriate to log1p transform TPM values first to not give disproportional weight to more highly expressed genes.