BcForms is a toolkit for concretely describing the molecular structure (atoms and bonds) of macromolecular complexes, including non-canonical monomeric forms, circular topologies, and crosslinks. BcForms was developed to help describe the semantic meaning of whole-cell computational models .
BcForms includes a grammar for describing forms of macromolecular complexes composed of DNA, RNA, protein, and small molecular subunits and crosslinks between the subunits. The DNA, RNA, and protein subunits can be described using BpForms and the small molecule subunits can be described using SMILES. BcForms also includes four software tools for verifying descriptions of complexes and calculating physical properties of complexes such as their molecular structure, formula, molecular weight, and charge: this website, a JSON REST API , a command line interface , and a Python API . BcForms is available open-source under the MIT license.
BcForms has the following features:
The BcForms represents complexes as a sets of subunits, including their stoichiometries, and a set of interchain/intersubunit crosslinks. Furthermore, BcForms can be combined with BpForms and SMILES descriptions of subunits to calculate properties of complexes.
BcForms descriptions of complexes consist of two parts:
The BcForms grammar is defined in Lark syntax , which is based on EBNF syntax .
complex: sub_a + sub_b sub_a: bpforms.ProteinForm(AC) sub_b: bpforms.ProteinForm(MK)
Structure: C[C@H]([NH3+])C(=O)N[C@H](C(=O)O)CS.CSCC[C@H]([NH3+])C(=O)N[C@@H](CCCC[NH3+])C(=O)O
Formula: C17H38N5O6S2
Molecular weight: 472.64
Charge: 3
complex: 2 * sub_c | x-link: [ l-bond-atom: sub_c(1)-1S11 | l-displaced-atom: sub_c(1)-1H11 | r-bond-atom: sub_c(2)-1S11 | r-displaced-atom: sub_c(2)-1H11 ] sub_c: bpforms.ProteinForm(CA)
Structure: C(=O)([C@@H]([NH3+])CSSC[C@@H](C(=O)N[C@@H](C)C(=O)O)[NH3+])N[C@@H](C)C(=O)O
Formula: C12H24N4O6S2
Molecular weight: 384.466
Charge: 2
The x-link attribute can be used to indicate a bond between atoms from different subunits. For example, this attribute can describe interstrand disulfide bonds between cysteines in proteins and crosslinks in DNA.
Each crosslink can be described by enclosing attributes which indicate the atoms involved in the bond within square brackets and delimiting the attributes with pipes (e.g., "| x-link: [l-bond-atom: sub_a(1)-1C1 | r-bond-atom: sub_b(1)-3C2 | ...]").
BcForms allows two ways of defining inter-subunit crosslinks: inline definition and definition using our ontology of crosslinks.
Each crosslink can be described using the following attributes:
Each crosslink can have one or more left and right bond atoms, and zero or more left and right displaced atoms. Each crosslink must have the same number of left and right bond atoms.
| x-link: [ l-bond-atom: sub_c(1)-1S11 | l-displaced-atom: sub_c(1)-1H11 | r-bond-atom: sub_c(2)-1S11 | r-displaced-atom: sub_c(2)-1H11 ]
| x-link: [ l-bond-atom: b(1)-4C2 | r-bond-atom: a(2)-1N1-1 | l-displaced-atom: b(1)-4O1 | l-displaced-atom: b(1)-4H1 | r-displaced-atom: a(2)-1H1+1 | r-displaced-atom: a(2)-1H1 ]
Each crosslink can alternatively be described by using our ontology with three attributes. The list of crosslinks defined in the ontology is available at bpforms.org/crosslink .
Each crosslink must have one type, one left monomeric form, and one right monomeric form.
Complexes can have zero, one, or more crosslinks.
| x-link: [ type: disulfide | l: sub_c(1)-1 | r: sub_c(2)-1 ]
| x-link: [ type: glycyl_lysine_isopeptide | l: b(1)-4 | r: a(2)-1 ]
Each subunit, residue, and atom represented by BcForms has a unique coordinate. The coordinates of repeated subunits range from one to the stoichiometry of the subunit. The coordinate of each residue is a two-tuple of the coordinate of its parent subunit and its position within the residue sequence of its parent subunit. The coordinate of each atom is a three-tuple of the coordinate of its parent subunit, the position of its parent residue within the residue sequence of its parent polymer, and its position within the canonical SMILES ordering of its parent residue (which can be displayed by Open Babel).
The example below illustrates the atom coordinates for the modified amino acid N5-methyl-L-arginine.
[id: "AA0305" | name: "N5-methyl-L-arginine" | structure: "OC(=O)[C@H](CCCN(C(=[NH2])N)C) [NH3+]" | l-bond-atom: N16-1 | l-displaced-atom: H16+1 | l-displaced-atom: H16 | r-bond-atom: C2 | r-displaced-atom: O1 | r-displaced-atom: H1 ]
To help quality control information about macromolecules, the BcForms user interfaces include methods for verifying the syntactic and semantic correctness of complexes:
BcForms includes four software interfaces for verifying descriptions of complexes and calculating properties such as their molecular structures, formulae, molecular weights, and charges.
BcForms can be used in conjunction with commonly used standards in systems biology. BcForms is also easy to embed into documents such as Excel workbooks and comma-separated tables.
BcForms can be used to concretely describe the meaning of CellML components which represent complexes. BcForms can be used with the RDF element of component objects.
... <component cmeta:id="complex" name="complex"> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <rdf:Description rdf:about="#complex"> <bcforms:BcForm xmlns:bcforms="https://bcforms.org"> 3 * subunit </bcforms:BcForm> </rdf:Description> </rdf:RDF> </component> <component cmeta:id="subunit" name="subunit"> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <rdf:Description rdf:about="#subunit"> <bpforms:ProteinForm xmlns:bpforms="https://bpforms.org"> LID{AA0037}MAN{AA0037}FVGTR </bpforms:ProteinForm> </rdf:Description> </rdf:RDF> </component> ...
BcForms can be used to concretely describe the meaning of Systems Biology Markup Language (SBML) species elements which represent complexes. BcForms can be used with the annotation element of species elements.
... <species name="complex"> <annotation> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <rdf:Description rdf:about="#complex"> <bcforms:BcForm xmlns:bcforms="https://bcforms.org"> 2 * subunit </bcforms:BcForm> </rdf:Description> </rdf:RDF> </annotation> </species> <species name="subunit"> <annotation> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <rdf:Description rdf:about="#complex-a"> <bpforms:ProteinForm xmlns:bpforms="https://bpforms.org"> A{U}CR </bpforms:ProteinForm> </rdf:Description> </rdf:RDF> </annotation> </species> ...
Below are several resources which can be helpful for determining the subunit and crosslink composition of complexes.