Assemblies

Tip

To read the entire answer to a FAQ, click the arrow icon (collapsible-item-arrow) below any question to expand or collapse it.


What data files are required for assembly submissions?

Click to view answer

See the European Nucleotide Archive’s (ENA’s) documentation for details about the types of files that can be submitted for assembly submissions.


How do I assign locus tags to assemblies?

Click to view answer

Hint

Each profile in COPO is known as a study or project in ENA (after reads have been submitted).

Note

Reads must be submitted to assign a locus tag, as the European Nucleotide Archive (ENA) project submission is created only after reads submission is complete.

You can assign a custom locus tag when creating a profile in COPO. See the image below for guidance.

Adding locus tag to a profile

Profile form: Adding locus tag

If a locus tag is not assigned, ENA will automatically assign a locus tag to your assembly after it has been submitted in COPO and deposited to ENA.

See ENA’s documentation for more details. The documentation outlines rules that the locus tag prefix should conform to.


What should I select from the SAMPLE dropdown in the “Add Assembly” form?

Click to view answer

Hint

When submitting assemblies, the sample accession, also known as sraAccession, follow the format, ERSXXXXXXXX.

  • The SAMPLE dropdown menu in the Add Assembly form will display the sraAccession(s) that are associated with samples that have been submitted in COPO.

  • The sraAccession will be displayed in the sraAccession column in any data table that is associated with the profile and samples. In terms of assembly submission, the sraAccession will be displayed in the data table on the Reads page (once reads have been submitted).


Are assemblies and sequence annotations submitted together?

Click to view answer

No, assemblies and sequence annotations are submitted separately in COPO.

It is possible that the notion of simultaneous submission arises from the use of the EMBL flat file format, which combines both annotated assemblies and sequence annotations. This may lead to the impression of a simultaneous submission.

If you are submitting sequence annotations directly to the ENA, EMBL files must be used, as they include both assemblies and annotations together.

On the other hand, sequence annotations can be submitted separately to ENA if your data files are in formats such as .gff or .fasta.

Note

Data file submissions depend on how users prepare and generate their data. For instance, FASTA files are still essential for storing and sharing sequence data but, they are not sufficient for representing detailed genomic annotations.

For annotation tasks, formats like GFF, GTF and BED are more appropriate because they provide structured information about genomic features, gene structures and functional elements. Thus, while FASTA is not outdated, it is often used alongside more specialised formats for annotation purposes.

Please refer to the following sections in ENA’s documentation for more information:


Are accessions assigned to assembly submissions after studies are published?

Click to view answer

No, accessions are assigned after assembly submissions have been completed.

Publishing a profile (or study) only makes the submissions under the profile public and accessible on repositories such as the European Nucleotide Archive (ENA) and National Centre for Biotechnology Information (NCBI).

See the following sections for more information:

See the Finding Data Submission IDs section for more information.