Commit 098c7e6d authored by Beatriz Vicoso's avatar Beatriz Vicoso
Browse files

Add new file

parent 6d91cea1
# Assign a location on the Asinica genome to A.fran/A.kaz scaffolds based on their gene content
## Quick summary of output
The final file **AfranScaf_AsinChromLocation.txt** has the following columns:
* scaffold from outgroup species to be assigned to a chromosomal location
* chromosome in the reference species
* location on the chromosome
* number of genes/transcripts that supported this assignment
* total match score of genes that supported this assignment
## Map transcripts to the A. sinica genome assembly
We map transcripts of A. franciscana and A. sp. Kazakstan with pblat:
```
module load pblat
pblat -minScore=50 Artemia_sinica_genome_29_12_2021.fasta ArtemiaSinica_EviGene_assembly.okay.cds trans_vs_sinica.blat -t=dnax -q=dnax -threads=50
```
Then keep only the location with the highest mapping score for each transcript:
```
sort -k 10 trans_vs_sinica.blat > trans_vs_sinica.blat.sorted
perl 2-besthitblat.pl trans_vs_sinica.blat.sorted
```
The script besthit.pl is [here]().
We also keep, for each location on the genome, only the transcript with the highest mapping score (unless the two overlap by less than 20bps):
```
sort -k 14 AsinTranscripts500_vs_AfranGenome.blat.sorted.besthit > AsinTranscripts500_vs_AfranGenome.sortedbyDB
perl 2-redremov_blat_v2.pl AsinTranscripts500_vs_AfranGenome.sortedbyDB
```
The script redremov_blat_v2.pl is [here]().
## Script to assign best location
Usage:
```
perl AssignScaffoldLocation.pl inputfile
```
The input file should be in the following format (sorted by contig):
```
($contig, $gene, $chrom, $coord, $score) = split(/\s/, $line);
```
Where the different columns are:
* contig = contig/scaffold from outgroup species to be assigned to a location
* gene = gene/transcript from reference species
* chrom = chromosome of gene/transcript in reference species
* coord = midpoint of gene on the chromosome in the reference species (mean of start and end of blat match for instance)
* score = match score of gene to the scaffold to be assigned to a location
## Run pipeling to assign best location on chromosomes
### Make input file
### Assign location
```
perl AssignScaffoldLocation.pl AfranScaf_inputforLocAssign.txt
mv AfranScaf_inputforLocAssign.txt.bestlocation AfranScaf_AsinChromLocation.txt
```
The final file **AfranScaf_AsinChromLocation.txt** has the following columns:
* scaffold from outgroup species to be assigned to a chromosomal location
* chromosome in the reference species
* location on the chromosome
* number of genes/transcripts that supported this assignment
* total match score of genes that supported this assignment
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment