choice between fasta and HMM for alignment to a genome

Hi,
I have a set of protein sequences of TE genes in a fasta file (there is some redundancy of close sequences, but there are proteins from all TE classes, so high variation) that I am using to query genome assemblies (200 Mb to 20 Gb) to find the location of TE coding regions.
I am running BATH as
`bathsearch --cpu 20 -o output_bath --tblout output_tab TE_proteins.fa assembly.fa`
and it is taking more than a week now.
I wonder if creating clusters of sequences belonging to the same TE family and making HMM models will be useful to speed up the searches I will do with the next assemblies - would the time invested in this pay off in the long term? do you have an estimation of the difference between the two options?
Thanks,
Dario

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

choice between fasta and HMM for alignment to a genome #13

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

choice between fasta and HMM for alignment to a genome #13

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions