Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Please provide all steps starting how to get RNA SEQ file, then how to get RNA S

ID: 3841488 • Letter: P

Question

Please provide all steps starting how to get RNA SEQ file, then how to get RNA SEQ FASTQ file and then perform below analysis. All sort of details are appreciated to reproduce the analysis.

This assignment will have you execute the steps of an NGS workflow using the Galaxy system.

A. QC analysis of an RNA-SEQ NGS dataset using FASTQC

B. Mapping the NGS dataset using a alignment and mapping program BWA-MEM

C. Generate statistics for your BAM dataset using SAMTools Stats

D. Design a workflow for Variant Calling (creating a VCF file) and Annotation using programs available in Galaxy

The required deliverables for the assignment are selected to reinforce the information needed to reproduce the analysis. With either option, the work that you perform should be documented sufficient for reproduction by another individual or student. (A good way to test your approach is to have one of your peer students attempt to reproduce your results. Remember, sharing insight is permitted provided attribution is given to the source of information used.)

A written description of the analysis performed including programs and/or modules used as well as data formats

Identifiers for the datasets and their source

Statistics about your generated BAM file

A diagram of the workflow steps for the analysis you performed

A diagram of the workflow steps used to conduct variant analysis

A written description of the variant workflow including identifying programs used and data formats employed

A written discussion of challenges encountered recommendations to improve the assignment

Explanation / Answer

A.Efficient and accurate tools to perform gene expression analysis for population genomics studies. RNA-seq performed on the Illumina platform is now a mature technology , but there are still hurdles for its analysis. Mapping is long, it generates large BAM files to are incovenient to manipulate, reads mapping to multiple location are often just discarded, gene coverage is inequal due to biases during library preparation steps, etc. There have been a few recent methodological developments that are real game-changers for the analysis and interpretation of RNA-seq data, and that you will discover in this practical.

B.To quantify the abundances of genes, traditional pipelines were aligning reads to transcriptome/genome and counting how many reads were overlapping each gene (e.g., BWA, Bowtie, Tophat, STAR tools). This is conceptually simple, but it is slow (a seed match needs to be extended), and it leaves the user with a lot of arbitrary choices to make: for example, how many mismatches to allow? What to do with reads mapping to multiple features? New approaches to this problem have recenty emerged with the pseudo-alignement concept. First, reads are split into k-mers. Second, the k-mers are mapped to the indexed transcriptome (since only perfect match of short sequences is tested, this is done very fast using a hash table). Finally, the individual transcripts are quantified using a probabilistic model, based on their compatibility with the k-mers found in the reads. This procedure is very fast (can be run on your laptop!), does not generate huge intermediate SAM/BAM files, and according the first tests, is yielding results that are at least as accurate as traditional pipelines.

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote