Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

*** I am asking this question for third time, previously someone answered the qu

ID: 3841565 • Letter: #

Question

*** I am asking this question for third time, previously someone answered the question incorrectly and with the same theoretical explanation

The person who answered this question previously, PLEASE DO NOT ANSWER THIS QUESTION IF YOU DON'T KNOW about Galaxy software.

Only person who has expertise using Galaxy software, please answer it. Galaxy software is available at https://usegalaxy.org/

Please carefully read below question and provide all details, so that person can follow those steps in Galaxy to reproduce those analysis.****

Now here's the question:

Next-Generation Sequencing has fostered a revolution in bioinformatics, opening the pathway to understanding biology in ways only previously imagined. Through the combination of millions of small sequences extracted and enumerated, the biology of the system can be probed and interrogated in very precise ways.

This assignment will have you execute the steps of an NGS workflow using either the Galaxy system or R.

QC analysis of an RNA-SEQ NGS dataset using FASTQC

Mapping the NGS dataset using a alignment and mapping program BWA-MEM

Generate statistics for your BAM dataset using SAMTools Stats

Design a workflow for Variant Calling (creating a VCF file) and Annotation using programs available in Galaxy

The required deliverables for the assignment are selected to reinforce the information needed to reproduce the analysis. With either option, the work that you perform should be documented sufficient for reproduction by another individual or student. (A good way to test your approach is to have one of your peer students attempt to reproduce your results. Remember, sharing insight is permitted provided attribution is given to the source of information used.)

A written description of the analysis performed including programs and/or modules used as well as data formats

Identifiers for the datasets and their source

Statistics about your generated BAM file

A diagram of the workflow steps for the analysis you performed

A diagram of the workflow steps used to conduct variant analysis

A written description of the variant workflow including identifying programs used and data formats employed

A written discussion of challenges encountered recommendations to improve the assignment

Explanation / Answer

Profound transcriptome sequencing (RNA-seq) gives huge and important data concerning every deciphered component in the genome. Utilizing RNA-seq, specialists can, for example, profile quality expression, grill elective joining, recognize novel transcripts and distinguish deviant transcripts and coding variations. RNA-seq tests ought to in a perfect world have the capacity to specifically recognize and evaluate all RNA species, paying little heed to their size or recurrence. Be that as it may, momentum RNA-seq conventions still have a few characteristic predispositions and constraints, for example, nucleotide organization inclination, GC inclination and PCR predisposition. These predispositions straightforwardly influence the precision of numerous RNA-seq applications (Benjamini and Speed, 2012; Hansen and Brenner, 2010) and can be specifically checked from crude groupings utilizing instruments like FastQC. In any case, these crude succession based measurements are not adequate to guarantee the ease of use of RNA-seq information; other RNA-seq-particular quality control (QC) measurements, for example, sequencing profundity, read circulation and scope consistency, are considerably more essential. For example, sequencing profundity must be soaked before doing numerous RNA-seq applications, including expression profiling, elective joining examination, novel isoform distinguishing proof and transcriptome recreation. The utilization of RNA-seq with unsaturated sequencing profundity gives loose estimations, (for example, for RPKM and joining list) and neglects to distinguish low plenitude graft intersections, along these lines constrain the exactness of many examinations. In the meantime, sequencing profundity is straightforwardly identified with the cost of investigation. For a RNA-seq dataset near immersion, extra sequencing is not savvy, as it would give minimal extra data. As of now, a couple instruments are accessible for the QC of high-throughput sequencing information, yet the vast majority of them (FastQC (http://www.bioinformatics.babraham.ac.uk/ventures/fastqc/), htSeqTools, FASTX-ToolKit (http://hannonlab.cshl.edu/fastx_toolkit/) and SAMStat) just concentrate on crude succession related measurements (Goecks et al., 2010; Lassmann et al., 2011; Planet et al., 2012; Reich et al., 2006). RNA-SeQC is the main apparatus intended for RNA-seq QC, however despite everything it needs numerous critical capacities, for example, immersion checking (Deluca et al., 2012). To address these necessities, we have created RSeQC to completely evaluate the nature of RNA-seq tests performed on clinical specimens or other all around commented on model life forms, for example, mouse, fly, Caenorhabditis elegans and yeast. RSeQC contains essential modules to assess crude arrangement quality, RNA-seq-particular modules to perform explanation based checking and utility modules for information perception (Supplementary Fig. S1). Correlation with other QC devices shows not just that RSeQC is more complete and proficient additionally that it has a few extraordinary checks not accessible somewhere else (Supplementary Table S1).
2 FEATURES AND METHODS
RSeQC comprises of a progression of Python projects to assess RNA-seq tests from various angles. The following are some chosen modules from RSeQC:
"bam_stat.py" is utilized to check the mapping insights of peruses that are QC fizzled, one of a kind mapped, join mapped, mapped in appropriate combine, and so forth.
"inner_distance.py" is utilized to assess the inward separation dissemination between combined peruses. The assessed internal separation ought to be predictable with gel estimate determination. This is an essential parameter when utilizing RNA-seq information to recognize structure variety or unusual joining.
"geneBody_coverage.py" scales all transcripts to 100 nt and figures the quantity of peruses covering every nucleotide position. At long last, it produces a plot delineating the scope profile along the quality body (Fig. 1A).
Fig. 1
See largeDownload slide
Cases of RSeQC yield. (A) Coverage consistency over quality body. All transcripts were scaled into 100 nt. (B) Saturation investigation of expression for 25% most noteworthy communicated qualities. (C) Saturation examination of intersection discovery. (D) Annotation of identified join intersections. 'known': join intersections with both 5 graft site (5 SS) and 3 graft site (3 SS) explained by reference quality model; 'finish novel': join intersections with neither 5 SS nor 3 SS clarified by reference quality model; 'halfway novel': graft intersections with either 5 SS or 3 SS commented on by reference quality model
4'read_distribution.py' figures the division of peruses mapped to coding exons, 5-untranslated district (UTR) exons, 3-UTR exons, introns and intergenic areas in view of the quality model gave. This module generally mirrors the consistency of scope; for instance, peruses are by and large over-spoken to in 3-UTR for the polyA + RNA-seq convention. One can likewise apply this module to gauge the foundation clamor level.
"RPKM_saturation.py" decides the accuracy of evaluated RPKMs at the current sequencing profundity by resampling (jackknifing) the aggregate mapped peruses. We utilize percent relative mistake (100 ×| RPKMobs-RPKMreal |/RPKMreal) to quantify the exactness of assessed RPKM (Fig. 1B). By and by, it is difficult to assess RPKMreal , and we utilize RPKM evaluated from aggregate peruses to inexact RPKMreal .
"junction_saturation.py" decides whether the current sequencing profundity is adequate to perform elective grafting investigations. The idea is like that of 'RPKM_saturation.py': join intersections are identified for every re-inspected subset of peruses, and the quantity of recognized graft intersections will increment as the resample rate increments before at long last achieving a settled esteem. The intersection immersion test is essential for option joining examination, as utilizing an unsaturated sequencing profundity would miss numerous uncommon graft intersections. (Fig. 1C).
"infer_experiment.py" is utilized to theorize the trial configuration by inspecting a subset of peruses from the BAM document and contrasting their genome arranges and strands and those of the reference quality model. This module can decide whether the given RNA-seq has been sequenced with combined end or single-end peruses. The module can likewise gage whether sequencing is strand-particular, and provided that this is true, how peruses are stranded.
"junction_annotation.py" isolates all identified join intersections into 'known', 'finish novel' and 'incomplete novel' by contrasting them and the reference quality model (Fig. 1D).
"RPKM_count.py" computes the crude read number and RPKM values for every exon, intron and mRNA district characterized by the reference quality model.
"bam2wig.py" can productively change over a BAM record into a squirm petition for perception. Squirm documents can be effortlessly changed over to fat cat records utilizing the UCSC wigToBigWig device.
3 RESULTS AND CONCLUSIONS
In rundown, the RSeQC bundle gives various valuable modules that can exhaustively assess RNA-seq information. 'Fundamental modules' rapidly assess succession quality, nucleotide structure predisposition, PCR inclination and GC inclination, while 'RNA-seq particular modules' research the sequencing immersion status of both join intersection identification and expression estimation. These modules additionally investigate the mapped read-cutting profile, mapped read appropriation, scope consistency over the quality body, reproducibility, strand specificity and graft intersection explanation. At last, RSeQC incorporates a few valuable instruments to control and standardize BigWig records for information representation.
Subsidizing: The Department of Defense Prostate Cancer Program PC094421 and the Cancer Prevention and Research Institute of Texas (RP110471-C3).
Irreconcilable situation: none announced.