Commit 1e995499 authored by Marwan ELKREWI's avatar Marwan ELKREWI
Browse files

Update bestlocation.md

parent 098c7e6d
......@@ -11,35 +11,42 @@ The final file **AfranScaf_AsinChromLocation.txt** has the following columns:
## Map transcripts to the A. sinica genome assembly
We map transcripts of A. franciscana and A. sp. Kazakstan with pblat:
We map transcripts of A. sinica to the genome with pblat and only keep the location with the highest mapping score for each transcript:
```
module load pblat
pblat -minScore=50 Artemia_sinica_genome_29_12_2021.fasta ArtemiaSinica_EviGene_assembly.okay.cds trans_vs_sinica.blat -t=dnax -q=dnax -threads=50
```
pblat -minScore=50 Artemia_sinica_genome_29_12_2021.fasta ArtemiaSinica_EviGene_assembly.okay.cds genome_vs_trans.blat -t=dnax -q=dnax -threads=50
Then keep only the location with the highest mapping score for each transcript:
sort -k 10 genome_vs_trans.blat > genome_vs_trans.blat.sorted
```
sort -k 10 trans_vs_sinica.blat > trans_vs_sinica.blat.sorted
perl 2-besthitblat.pl trans_vs_sinica.blat.sorted
perl 1-besthitblat.pl genome_vs_trans.blat.sorted
```
The script besthit.pl is [here]().
We also keep, for each location on the genome, only the transcript with the highest mapping score (unless the two overlap by less than 20bps):
We Keep only transcripts longer than 500bps in the A. sinica transcriptome:
```
module load fafilter/v.357
faFilter -minSize=500 ArtemiaSinica_EviGene_assembly.okay.cds ArtemiaSinica_EviGene_assembly.over500bps.cds
```
We map the transcripts of A. sinica against the A. franciscana and A. sp. Kazakstan genomic scaffolds with pblat (the example is for A. sp. Kazakhstan):
```
module load pblat
pblat -minScore=50 k41_scaffolds_kazakh_male_genome_16_09_2021.scafSeq ArtemiaSinica_EviGene_assembly.over500bps.cds kgenome_vs_trans.blat -t=dnax -q=dnax -threads=40
```
The we keep only the best hit per A. sinica transcript in the Afran/Akaz genome, but also remove redundant transcripts (ie if multiple transcripts map to the same location keep only the one with the best hit).
```
sort -k 14 AsinTranscripts500_vs_AfranGenome.blat.sorted.besthit > AsinTranscripts500_vs_AfranGenome.sortedbyDB
perl 2-redremov_blat_v2.pl AsinTranscripts500_vs_AfranGenome.sortedbyDB
sort -k 10 kgenome_vs_trans.blat > kgenome_vs_trans.blat.sorted
perl 1-besthitblat.pl kgenome_vs_trans.blat.sorted
sort -k 14 kgenome_vs_trans.blat.sorted.besthit > kgenome_vs_trans.sortedbyDB
perl 2-redremov_blat_v2.pl kgenome_vs_trans.sortedbyDB
```
The script 1-besthitblat.pl is [here]().
The script redremov_blat_v2.pl is [here]().
## Script to assign best location
Usage:
```
perl AssignScaffoldLocation.pl inputfile
......@@ -59,13 +66,24 @@ Where the different columns are:
## Run pipeling to assign best location on chromosomes
### Make input file
Select only important information from the A. sinica transcript to genome blat file:
```
cat genome_vs_trans.blat.sorted.besthit | awk '($11>500)' | awk '{print $10,$14,($16+$17)/2,$1}' | sort > Asinica_GeneGenomeLocation_score.sorted
```
Select only important information from ReferenceTranscript to OutgroupGenome blat:
```
cat kgenome_vs_trans.sortedbyDB.nonredundant | awk '{print $10, $14, $1}' | sort > AsinTranscripts500_vs_AkazGenome.joinable
```
Merge files:
```
join AsinTranscripts500_vs_AkazGenome.joinable Asinica_GeneGenomeLocation_score.sorted | awk '{print $2, $1, $4, $5, $3}' | sort > AkazScaf_inputforLocAssign.txt
```
### Assign location
```
perl AssignScaffoldLocation.pl AfranScaf_inputforLocAssign.txt
mv AfranScaf_inputforLocAssign.txt.bestlocation AfranScaf_AsinChromLocation.txt
perl AssignScaffoldLocation.pl AkazScaf_inputforLocAssign.txt
mv AkazScaf_inputforLocAssign.txt.bestlocation AkazScaf_AsinChromLocation.txt
```
The final file **AfranScaf_AsinChromLocation.txt** has the following columns:
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment