Glossary

API

A software interface that allows two or more applications to communicate with each other.

ASG

A project that aims to provide the genomic foundations needed by scientists to answer key questions about the ecology and evolution of symbiosis in marine and freshwater species, where at least one partner is a microbe.

Assembly

Any sequence records from coding or non-coding regions to full assembled chromosomes.

Barcoding manifest

In COPO, barcoding manifest submissions contain short assembled and annotated sequences that represent interesting features or gene regions. In ENA, the barcoding manifests are referred to as targeted sequences.

BGE

A project that undertakes a comprehensive application of genomic science to biodiversity research will drive fundamental advances in conservation science and policy.

Biocuration

The extraction of unstructured biological data from manifests into a structured, computable form.

Compliance field

A compliance field in COPO is a mandatory field that cannot be updated after a manifest has been uploaded or submitted before or after the samples have been accepted or rejected by a sample manager.

COPO

COPO is a web-based tool for creating and managing metadata for research objects.

COPO profile

Also known as work profile. A collection of ‘research objects’ or components that contain data generated on a biological research project or study.

There are two general types of profiles in COPO: Genomics profiles and ToL profiles. ASG, DToL and ERGA are the ToL (primary) projects brokered through COPO.

CRUD operations

Comprises posting data (creating and/or updating data ), reading data (e.g. making queries) and deleting data.

DataTables

A jQuery library plug-in that displays tabular data. They are used in COPO to display a list of research objects. See DataTables for more information.

DB

A database is an organised collection of data, generally stored and accessed electronically from a computer system.

DNA

The molecule inside cells that contains the genetic information responsible for the development and function of an organism.

Docker

A set of platform as a service (PaaS) products that use OS-level virtualisation to deliver software in packages called containers. Containers are isolated from one another and bundle their own software, libraries and configuration files; they can communicate with each other through well-defined channels.

Docker Engine is the software that oversees the hosting of the containers.

Dockerfile

A text-based file with no file extension that contains a script of instructions. Docker uses this script to build a container image. See Dockerfile reference

DToL

Aims to sequence the genomes of 70,000 species of eukaryotic organisms in Britain and Ireland.

It is one of the ToL projects brokered through COPO. The project is a collaboration between biodiversity, genomics and analysis partners that is transforming the manner by which biology, conservation and biotechnology are conducted.

DToL may sometimes be referred to as DTOL.

EBI

EBI is a UK government-funded public repository for biological data that provides free access to biomedical and genomic information.

EI

The Earlham Institute is a hub of life science research, training, and innovation focused on understanding the natural world through the lens of genomics. EI supports several projects including COPO project.

EI is is one of four Norwich BioScience Institutes (NBI) international centres that is based in the Norwich Research Park in Eastern England. It is also one of eight institutes that receive strategic funding from the Biotechnology and Biological Sciences Research Council (BBSRC) , part of UKRI, as well as support from other research funders.

The other NBI centres are John Innes Centre (JIC), Sainsbury Laboratory (TSL) and Quadram Institute Bioscience (QIB).

EMBL

EMBL is a EU intergovernmental organisation that performs basic research in molecular biology and provides services to the scientific community in its member states.

ENA

ENA is a repository for nucleotide sequence data that provides annotated DNA and RNA sequences . It also provides free and unrestricted access stores complementary information such as experimental procedures that details sequence assemblies and other metadata related to sequencing projects.

ENA is part of the International Nucleotide Sequence Database Collaboration (INSDC), which comprises the DNA DataBank of Japan (DDBJ), the European Molecular Biology Laboratory (EMBL), and GenBank at the National Center for Biotechnology Information (NCBI).

ERGA

A project that revolves around a pan-European scientific response to current threats to biodiversity by studying reference genomes which provide the most complete insight into the genetic basis that forms each species and represent a powerful resource in understanding how biodiversity functions.

FAIR

The ability to find, access, interoperate and reuse data with no or minimal human intervention.

GAL

Partners or companies that perform genome sequencing.

Genome

A complete set of genetic material stored in long molecules of DNA in living organisms such as virus, oak tree or an elephant.

Genomics

The study of all or a substantial portion of the genes of an organism as a dynamic system, over time, to determine how those genes interact and influence biological pathways, networks, and physiology.

HTTP

A protocol that allows communication between different systems. Similar to HTTPS.

See also: HTTPS protocol

IP

A protocol that allows communication between different systems.

Locus tag

Adapted from ENA’s definition: Locus tags are identifiers applied systematically to every gene in a sequencing project.

MacOS

A series of proprietary graphical operating systems developed and marketed by Apple Inc. since 2001.

Manifest

A csv file or Microsoft (MS) Excel spreadsheet that contains metadata regarding a research object.

The manifest is used by scientists to upload metadata into COPO.

Manifest checklist

A list of fields that are required to be filled in for a sample to be considered valid.

Manifest ID

A unique identifier assigned to each manifest record in COPO.

Metadata

In-depth and controlled contextual information about when, where, how and why data has been collected like geographical location, time of collection, tube or well identification and specimen identification. Metadata can relate to a research elements such as samples, assembles, annotations or experiments.

In life sciences, metadata facilitates biocuration which revolves around the structuring of datasets in a way that allows automated search, query and retrieval.

MIT licence

A permissive free software license from the Massachusetts Institute of Technology that has limited restriction on the reuse of software.

MongoDB

A document-oriented database program that uses JSON-like documents with optional schemas.

NCBI

NCBI is a US government-funded public repository for biological data that provides free access to biomedical and genomic information.

Profile component

Also known as research object. It forms part of a research project or study.

Templates for creating or describing research objects can be found here.

Profile Types legend

This describes the types of the profiles that have been created. It is located at the right of the * Work Profiles* web page.

PyCharm

A Python IDE (Integrated Development Environment) that provides code analysis, a graphical debugger, an integrated unit tester, integration with version control systems and supports web development with Django.

Read the Docs

A documentation hosting service based around Sphinx. COPO documentation is hosted on Read the Docs.

Reads

A research object that holds raw read files and sequencing methods that refer to the DNA sequence from a small section of DNA.

It can be associated with one or more files, assemblies and sequence annotations.

Research

Systematic investigation, including research development, testing, and evaluation, designed to develop or contribute to generalisable knowledge.

REST

Relies on a stateless, client-server and cacheable communications HTTP request protocol. In COPO, it is used to communicate with the COPO API to perform CRUD operations using HTTP requests.

RNA-seq

Analysis based on next-generation sequencing (NGS) data has recently become the de facto standard for the analysis of gene expression at the level of the whole transcriptome.

RO-Crate

RO-Crate is based on schema.org annotations in JSON-LD, and aims to make best-practice in formal metadata description accessible and practical for use in a wider variety of situations, from an individual researcher working with a folder of data, to large data-intensive computational research environments.

See more information about RO-Crate here.

Sample

Also known as biosample. A research object that represents biological samples collected and sequenced in real life.

Sample checklist

The checklist of metadata that the sample was registered with.

Sample manager

A sample manager is a person who is responsible for accepting or rejecting samples in a research project.

This person can also upload manifest on behalf of sample submitters.

Sample submitter

A sample submitter is a person who submits or uploads samples to a research project.

Sample submitters may also be referred to as manifest providers or manifest submitters.

Sequence annotation

A research object that is used to describe the process of marking specific features in a DNA, RNA or protein sequence with descriptive information about structure or function.

It can be associated with one or more files, reads and assemblies.

Singular stage

In datafile description, a singular stage is a stage of the description wizard in which all the files in the description bundle are constrained (by the system) to share the same metadata.

SOP

A manual compiled by various profile groups to help scientists fill in a manifest correctly.

See the SOPs section for more information.

Specimen

Also known as biospecimen. It is a piece or portion of tissue, urine or other biologically derived material used for diagnosis and analysis.

SRA accession

A unique identifier assigned to a sample by the Sequence Read Archive (SRA) database. It usually starts with ‘ERS’ followed by a number.

Studio3T

Studio3T, formerly known as Robo3T, is a GUI for MongoDB.

Visit Studio3T to download an appropriate version for your OS.

ToL

A worldwide collaborative effort of biologists and nature enthusiasts to provide information about biodiversity, the characteristics of different groups of organisms and their evolutionary history (phylogeny).

Ubuntu

A Linux distribution based on Debian and composed mostly of free and open-source software.

URI

A string of characters that unambiguously identifies a particular resource.

See also: Uniform Resource Identifier

Virtual sample

A virtual sample is a research object that is submitted to COPO under a Genomics profile. It represents samples in a run sequencing file.

See the Virtual sample submission section for more information.

VSCode

VSCode is a lightweight but powerful source code editor which runs on your desktop and is available on Windows, macOS and Linux.