Bacterial strains:
The following bacterial strains are used in our experiments:
You can check the possibility of a protein sequence inhibiting the CRISPR-Cas systems from these bacteria using our tool. You can find the fasta files for these CRISPR-Cas systems here: link
Model options:
There are 2 models available for prediction:
The first model is the best performing Long Short-Term Memory (LSTM) network,
and the second model is the best performing Convolutional Neural Network (CNN) from our work.
The first model has a longer inference time but our experiemnts showed
that this model produces more confident probabilities than the CNN model.
We suggest experimenting with both models to see which one works best for your needs.
Input Format:
Currenlty the maximum length of each input sequence is 233 amino acids.
Any input longer than this will be truncated, which
might affect the prediction and generate inaccurate results.
Only valid entries following the protein FASTA format will be processed.
One or more entries can be passed as input. Each entry must consist
of a header line beginning with a greater-than (>) symbol, followed
by one or more sequence lines. All sequence lines must consist of only
the following capitalized amino acid symbols: ACDEFGHIKLMNPQRSTVWY.
'U' and 'O' will be replaced with 'X' for the purposes of prediction.
For more information about FASTA formatting, you may visit NCBI's protein FASTA help page.
Example FASTA input:
>AcrIF1 Pseudomonas phage JBD30
MKFIKYLSTAHLNYMNIAVYENGSKIKARVENVVNGKSVGARDFDSTEQLESWFYGLPGSGLGRIENAMNEISRRENP
AcrIF1_Pseudomonas_phage_JBD30
Pseudomonas aeruginosa PA14 | type: (I-F) | inhibition probability : 0.872
Serratia sp. ATCC 39006 | type: (I-F) | inhibition probability : 0.765
Pectobacterium atrosepticum SCRI1043 | type: (I-F) | inhibition probability : 0.708
Escherichia coli K12 | type: (I-E) | inhibition probability : 0.470
Serratia sp. ATCC 39006 | type: (I-E) | inhibition probability : 0.420
Pseudomonas aeruginosa SMC4386 | type: (I-E) | inhibition probability : 0.243
Pseudomonas aeruginosa PaLML1 | type: (I-C) | inhibition probability : 0.213
This result demonstrates that an input sequence with the header "AcrIF1_Pseudomonas_phage_JBD30" will probably inhibit the type I-F CRISPR-Cas systems from Pseudomonas aeruginosa PA14, Serratia sp. ATCC 39006 and Pectobacterium atrosepticum SCRI1043. Based on these predictions, the "AcrIF1_Pseudomonas_phage_JBD30" will not be effective in inhibiting the other CRISPR-Cas systems listed in the output. The predictions of the model will be displayed in the output box below the input after pressing submit. It is possible to copy the predictions by using the "Copy to Clipboard" button. Additionally, the outputs can be downloaded in text format using the "Export to .txt" button.