Any languages can be used, but recommend R, Python, or Matlab Task: Infer a gene
ID: 3863769 • Letter: A
Question
Any languages can be used, but recommend R, Python, or Matlab
Task: Infer a gene regulatory network from gene expression data and make a ROC plot.
Download the gene expression data in the link below, where there are 500 samples and
each sample has 10 gene expression.
http://ksuweb.kennesaw.edu/~mkang9/teaching/CS4491_CS7990/Gene_expression_1.csv
http://ksuweb.kennesaw.edu/~mkang9/teaching/CS4491_CS7990/Adj_1.csv
Task : Gene regulatory networks inference based on the correlation-based approach.
- Dataset:
o Gene_expression_1.csv: contains gene expression data for task 1
o Adj_1.csv: contains adjacency matrix of ground truth for task 1
1. Load the gene expression data (Gene_expression_1.csv) and the ground truth adjacency
matrix (Adj_1.csv).
2. Compute pairwise correlation matrix, and show the matrix. E.g., see Fig. 1.
3. Given the range of threshold (e.g., 0, 0.1, 0.2, 0.3, …, 0.9, 1), compare the adjacency
matrices between the network and the ground truth.
4. Compute a confusion matrix for each threshold
5. Compute TPR and FPR for each threshold
6. Make a ROC plot. E.g., see Fig. 2
L10 10) (2,10.1731708398 0.000000e 00 2612879e-o4 Bla.00010s41912.612879e-04 (4.10.0004 2152017774737e-07 (sela.36169033675.903489-01 (6.10.6628425202 2.824813e-01 2. lo 32388276a3 5.282464e-01 1.954879e-D42.50638le o4 5.220s84e 01 5.010s28e-01 0.0000000000 2.508S63e-04 3.773s63e-015.754074e-01. IT, (8.10.0006998978 1.042277e-03 (9.10 1313074479 10.10.4178 738114 6582586e-01 Figure 1. Correlation matrix 0.0 0.2 0.4 0.6 0.8 1.0 Figure 2. ROC in Task 1Explanation / Answer
R and Python are equally good if you want to find outliers in a dataset, but if you want to create a web service to enable other people to upload datasets and find outliers, Python is better. Python is a general purpose programming language, which means that people have built modules to create websites, interact with a variety of databases, and manage users.
In general, if you want to build a tool or service that uses data analysis, Python is a better choice.
R builds in data analysis functionality by default, whereas Python relies on packages
Because Python is a general purpose language, most data analysis functionality is available through packages like NumPy and pandas. However, R was built with statistics and data analysis in mind, so many tools that have been added to Python through packages are built into base R.
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.