Hi,
I have a set of protein sequences of TE genes in a fasta file (there is some redundancy of close sequences, but there are proteins from all TE classes, so high variation) that I am using to query genome assemblies (200 Mb to 20 Gb) to find the location of TE coding regions.
I am running BATH as
bathsearch --cpu 20 -o output_bath --tblout output_tab TE_proteins.fa assembly.fa
and it is taking more than a week now.
I wonder if creating clusters of sequences belonging to the same TE family and making HMM models will be useful to speed up the searches I will do with the next assemblies - would the time invested in this pay off in the long term? do you have an estimation of the difference between the two options?
Thanks,
Dario
Hi,
I have a set of protein sequences of TE genes in a fasta file (there is some redundancy of close sequences, but there are proteins from all TE classes, so high variation) that I am using to query genome assemblies (200 Mb to 20 Gb) to find the location of TE coding regions.
I am running BATH as
bathsearch --cpu 20 -o output_bath --tblout output_tab TE_proteins.fa assembly.faand it is taking more than a week now.
I wonder if creating clusters of sequences belonging to the same TE family and making HMM models will be useful to speed up the searches I will do with the next assemblies - would the time invested in this pay off in the long term? do you have an estimation of the difference between the two options?
Thanks,
Dario