mean vs. geomean

Hi @ebecht and @FPetitprez, 

I've noticed some inconsistency between the code and the paper and I'm wondering what was the original intention. 

The paper states: 

> Given a set of transcriptomic markers of a given category, we computed a corresponding per-sample score, called hereafter a MCP-counter score, using the log2 geometric mean of this set of markers.

While the implementation just calculates an arithmetic mean:

https://github.com/ebecht/MCPcounter/blob/b6eac73e91c246fcff0bb1a5c68a816cd588fc48/Source/R/MCPcounter.R#L14

Now if the input data were log2-transformed it would be somewhat like the geometric mean, but also not precisely, because the geometric mean would require an `exp()` after the arithmetic means of the logarithm.

```r
geomean = exp(mean(log(X)))
```

---

I'm mostly asking because in `immunedeconv` we recommend the users to specify raw TPM and forward them to MCPcounter unchanged, because I was assuming that it calculates a geometric mean internally. However, given the actual implementation I think it would be more appropriate to log1p transform TPM values first to not give disproportional weight to more highly expressed genes.





Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mean vs. geomean #37

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

mean vs. geomean #37

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions