The 450k probe uses two probe to test the methylation. One for methylated and one for unmethylated. (red channel and green channel). Then the data from this two channels will be converted into
- a
$\beta$ value which varies between 0 to 1, the higher it is, more methylated. - or M value, which had positive and negative values (0 means half methylated).
These two ways to quantify the methylation level can be acheived by function ratioConvert in package minfi:
$$
\beta_i= \frac{max(y_imeth,0)}{max(y_imeth,0) + max(y_iunmeth,0) + \alpha}
$$
$\alpha$ by default is 100 $$ M_i = log_2( \frac{max(y_i,meth,0)+\alpha}{ max(y_i,unmeth,0)+\alpha }) $$ this$\alpha$ , by default is 1
The model it uses is
$$
Y_{ij} = \beta_0(l_j) + x_i \beta_1(l_j) + \epsilon_{ij}
$$
-
$\beta_0(l_j)$ is the baseline methylation depended on the specific location$j$ -
$\beta_1(l_j)$ is the effect at the j-th position -
$\epsilon_{ij}$ is the measurement error. -
$Y_{ij}$ is the oberved methylation value
And in a case control study, the x is the case or control category, which is 0 or 1 (0 for control or 1 for case)
Note:
this
$\beta_1$ is not the same$\beta$ from the quantification section. this$\beta_1$ is the fitted coefficiency of how the methylaion changes along with the$x_i$ (In fact, the$Y_{ij}$ here is the$\beta$ in line one, the obverved methylation value)The x doen't has to be a binary factor. For instance, it can be the continuous age in the study to look into the relationship between age and methylation.
After fitting, we will have the
First, we will set a threshold for the beta hats, bumps above the threshold will be considered a candidate. The we will summerize the data by
- calculate the area under the curve
- keep two paras (length and height)
Then, we will test the bump by permutation or bootstrap, to calculate that under the null hypothesis, what area we would get and how likely we will get the observed bump or larger by chance
To identify the effect brought by our factors of interested, we commonly identified the differential methylation at three levels:
- each CpG
- DMR (Differential Methylated Region)
- differential methylated blocks (it is larger scale than DMR)
This an on-going research and there is no best practical method for this