Learn Before
Example 1: Genomics
Pharmaceutical companies are interested in determining the influence of genes on each other. A dataset may consist of jointly recorded gene activities for pairs of genes and for patients S_k = {( x_{k1} , y_{k 1 }), ⋯ , ( x_{ kn} , y_{kn} }.
Determining which gene influences which other gene is a costly experimental process, so only a subset of pairs of genes are generally labeled with ground truth of causal relationship .
The labeled dataset {( S_1 , g_1 ), ⋯ , ( S_k , g_k ), ⋯}, k = 1⋯ N , is an empirical sample of the “mother distribution” over pairs of genes and their causal labels, which may be used as training data to obtain a classifier, such that predictions of unknown causal labels can be made on new pairs of genes.
0
1
Tags
Data Science