AcrTransAct

Bacterial strains:
The following bacterial strains are used in our experiments:

Pseudomonas aeruginosa PA14 type I-F short name: PA14

Pectobacterium atrosepticum SCRI1043 type I-F short name: SCRI1043

Pseudomonas aeruginosa SMC4386 type I-E short name: SMC4386

Escherichia coli K12 type I-E short name: K12

Serratia species American Type Culture Collection (ATCC) 39006 types I-F and I-E, short names: ATCC39006_IF and ATCC39006_IE

Pseudomonas aeruginosa PaLML1/DVT419 type I-C, short name: PaLML1_DVT419

You can check the possibility of a protein sequence inhibiting the CRISPR-Cas systems from these bacteria using our tool. You can find the fasta files for these CRISPR-Cas systems here: link

Model options:
There are 2 models available for prediction:

The first model is the best performing Long Short-Term Memory (LSTM) network, and the second model is the best performing Convolutional Neural Network (CNN) from our work. The first model has a longer inference time but our experiemnts showed that this model produces more confident probabilities than the CNN model. We suggest experimenting with both models to see which one works best for your needs.

Input Format:
Currenlty the maximum length of each input sequence is 233 amino acids. Any input longer than this will be truncated, which might affect the prediction and generate inaccurate results. Only valid entries following the protein FASTA format will be processed. One or more entries can be passed as input. Each entry must consist of a header line beginning with a greater-than (>) symbol, followed by one or more sequence lines. All sequence lines must consist of only the following capitalized amino acid symbols: ACDEFGHIKLMNPQRSTVWY. 'U' and 'O' will be replaced with 'X' for the purposes of prediction. For more information about FASTA formatting, you may visit NCBI's protein FASTA help page.

Example FASTA input:

>AcrIF1 Pseudomonas phage JBD30

MKFIKYLSTAHLNYMNIAVYENGSKIKARVENVVNGKSVGARDFDSTEQLESWFYGLPGSGLGRIENAMNEISRRENP

Output Format:
For each input sequence, the model generates the probability of that sequence inhibiting a certain CRISPR-Cas system. The name of the CRISPR-Cas system is written in short at the beginning of the line, followed by the probability of inhibition. For instance, an output like the following:

AcrIF1_Pseudomonas_phage_JBD30

Pseudomonas aeruginosa PA14 | type: (I-F) | inhibition probability : 0.872

Serratia sp. ATCC 39006 | type: (I-F) | inhibition probability : 0.765

Pectobacterium atrosepticum SCRI1043 | type: (I-F) | inhibition probability : 0.708

Escherichia coli K12 | type: (I-E) | inhibition probability : 0.470

Serratia sp. ATCC 39006 | type: (I-E) | inhibition probability : 0.420

Pseudomonas aeruginosa SMC4386 | type: (I-E) | inhibition probability : 0.243

Pseudomonas aeruginosa PaLML1 | type: (I-C) | inhibition probability : 0.213

This result demonstrates that an input sequence with the header "AcrIF1_Pseudomonas_phage_JBD30" will probably inhibit the type I-F CRISPR-Cas systems from Pseudomonas aeruginosa PA14, Serratia sp. ATCC 39006 and Pectobacterium atrosepticum SCRI1043. Based on these predictions, the "AcrIF1_Pseudomonas_phage_JBD30" will not be effective in inhibiting the other CRISPR-Cas systems listed in the output. The predictions of the model will be displayed in the output box below the input after pressing submit. It is possible to copy the predictions by using the "Copy to Clipboard" button. Additionally, the outputs can be downloaded in text format using the "Export to .txt" button.

Input Help