Dopamine aptamer structure determination

Eric Largy

1 Methods

1.1 System preparation

The system was prepared using the leap program from the Amber24 suite (Case et al. 2023). The OL21 and AMBER General Force Field for organic molecules (Version 1.81) were used to describe the complex (J. Wang et al. 2004; Zgarbová, Šponer, and Jurečka 2021; Love et al. 2023). The structure was explicitly solvated in a truncated octahedral box of water molecules, using the OPC model with the ad hoc Li/Merz ion parameters of atomic ions (12-6 set)(Izadi, Anandakrishnan, and Onufriev 2014; Li et al. 2020), with a minimum of 14 Å between the solute and the box edge. For simulations in absence of salts (case of unrestrained simulations), the system was neutralized by adding 25 Na⁺ counter-cations. In cases where 140 mM NaCl were added to better reflect the SELEX conditions (restrained simulations), the number of required ions (\(N_{\pm}\); here around 35 Na⁺ and 10 Cl^-) was determined following the SLTCAP method (Schmit et al. 2018), using Equation 1 simplified by Machado and Pantano into Equation 2, where \(\nu_w\) is the water volume of the simulation box (around \(3\times 10^5\) Å³ here) in reduced units, \(c_0\) the salt concentration, \(Q\) the total charge of the complex (here -25: 26 phosphates on the aptamer and 1 ammonium group on the dopamine), and \(N_0 = \frac{N_w \times c_0}{55.5}\), with \(N_w\). the number of water molecules in the simulation box (here, around \(9\times10^4\)) (Machado and Pantano 2020).

\[N_{\pm}=\nu_w c_0 e^{\mp ArcSinh(\frac{Q}{2 \nu_w c_0})} \tag{1}\] \[N_{\pm}=N_0\sqrt{1+(\frac{Q}{2N_0})^2 \mp \frac{Q}{2}} \tag{2}\]

Note that the SPLIT method describe by Machado and Pantano cannot be applied as our system does not satisfy the \(N_0 \gg Q\) condition; however it yields identical values in most cases or deviate by a single ion.

1.2 Restrained minimization, heating and molecular dynamics

Using an in-house R script (R Core Team 2023; Wickham et al. 2019), the distance restraints obtained in ARIA were converted to an 8-column format suitable for its processing by the makeDIST_RST function from the sander module of Amber. A custom map file was prepared to define common names for groups of protons sharing a given restraints (e.g. 3 H from a same methyl group). The resulting DISANG restraints file was then applied to all steps below.

All simulation steps were performed with pmemd.cuda (v. 18.0.0) from the CUDA version of AMBER (Götz et al. 2012; Salomon-Ferrer et al. 2013; Le Grand, Götz, and Walker 2013), on an NVIDIA H100 PCIe Tensor core GPU (CUDA version: 12.4) from the DOREMI CALI v3 cluster of the Mésocentre de Calcul Intensif Aquitain (Université de Bordeaux). The system was minimized for 20000 cycles using the steepest descent algorithm for the first 4000 steps and the conjugate gradient for the next 16000 steps. The weights of the restraints were kept constant at 100 kcal.mol^-1.Å^-2. The system was then heated at constant volume from 0 to 298 K over 18 ps then kept for 2 ps at the final temperature, using a time step of 2 fs, the Langevin thermostat with a 2.0 ps^-1 collision frequency and a different seed for the pseudo-random number generation for every run to avoid synchronization artifacts (Sindhikara et al. 2009), an 8 Å non-bonded cutoff, and the bonds involving hydrogen were constrained with the SHAKE algorithm. The system was further equilibrated five times at 298 K with the parameters above and the restraint weights ramping down from 100 to 5 kcal.mol^-1.Å^-2. The pressure was kept at 1 bar with the Berendsen barostat (Berendsen et al. 1984).

Restrained molecular dynamics for subsequent minimization were ran for 10 ns with the parameters above, and restraints weights set at 20 kcal.mol^-1.Å^-2. The final coordinates were further minimized, as described above except for the restraint weights set at 20 kcal.mol^-1.Å^-2. Remaining NMR violations were summarized with the sviol function from the Amber package.

Production MD were run over a microsecond with and without restraints, as well as in absence of dopamine, with the parameters above.

1.3 Data analysis

The bio3d package was used for minimized structure and trajectory files cleanup, alignment, filtering, averaging and analysis (B. J. et al. 2006). The determination of RMSD and RMSF was performed with the rmsd and rmsf functions. Dihedral angles, sugar pucker angles and amplitudes \(\theta_M\) were obtained with in-house R functions leveraging the torsion.xyz function, following Equation 3, where the pucker P is determined by Equation 4 and the sugar torsion angles \(\nu_i\) are defined by four atoms as shown below.

\[ \theta_M = \frac{\nu_2}{\cos P} \tag{3}\]

\[ \tan P = \frac{(\nu_4 + \nu_1)-(\nu_3 + \nu_0)}{2\nu_2(\sin(\frac{\pi}{5}) + \sin(\frac{2\pi}{5}))} \tag{4}\]

\[\nu_0=C4'-O4'-C1'-C2'\] \[\nu_1=O4'-C1'-C2'-C3'\] \[\nu_2=C1'-C2'-C3'-C4'\] \[\nu_3=C2'-C3'-C4'-O4'\] \[\nu_4=C3'-C4'-O4'-C1'\]

All atom-atom and ring-ring distances and angles were measured with the dist.xyz and angle.xyz functions. In-house R scripts were used to infer the formation of H-bonds from these values on full trajectories, by first selecting realistic donor/acceptor atoms for each residue/ligand, then verifying that the donor/acceptor distances and angles were compatible with H-bond formation.

Base pairs were further characterized with DSSR 2.3.2 (Lu, Bussemaker, and Olson 2015; Lu 2020), using the web API through the httr2 R package (Wickham 2023).

Principal component analysis on the final structures and trajectories (all atoms except hydrogen) was performed with the pca function from bio3d. The results were clustered with the k-means method using Euclidean distance and the best number of clusters determined by Nbclust (Charrad et al. 2014). Cross-correlation analysis was carried out with the dccm function of bio3d.

Molecular structures images were created in PyMOL 2.5 (Schrödinger, LLC 2021), interactive molecular structures with mol*, and ligand binding site diagram with LigPlot⁺ (Laskowski and Swindells 2011). All further data processing and all plotting was performed in R 4.3. Apache Arrow was used to write/read processed data files in feather format (Richardson et al. 2024).

2 Results

Hereafter, are described the results for the restrained simulations over 10 ns for the fifteen starting structures, the first of which was also simulated for a microsecond. As a control, the latter was also performed in absernce of restraints.

2.1 Trajectory

2.1.1 Visualization

2.1.1.1 Microsecond simulation

Below can be seen a comparison between the restrained and unrestrained microsecond simulations. One frame every 10 ns was extracted.

2.1.1.2 Production simulation for minimization

All twenty restrained molecular dynamic trajectories (10 ns) can be visualized below. Only one every 5 frames were extracted. Click on the top-left play button to animate the trajectory.

2.1.2 Average structures

Twenty structures of the microsecond restrained simulation were averaged over 1250 frames of the simulation, without further minimization. It is not incredibly useful as it was not further minimized, but it can be done. The minimized structures can be found in Section 2.2.

2.1.3 RMSD

2.1.3.1 Restrained simulations

The trajectory of almost all fifteen simulations seem to converge after the first hundreds of ps (Figure 1). Only structure 2 requires around 5 ns to converge. Minimization of the coordinates after ten nanoseconds of simulation is appropriate.

Figure 1: RMSD on all residues (including 5’ and 3’-ends) without hydrogens across the restrained simulations (10 ns), where the first frame is used as reference.

The microsecond simulation has a very stable RMSD vs. the first frame.

Figure 2: RMSD on all residues (including 5’ and 3’-ends) without hydrogens across the restrained simulation (1 microsecond), where the first frame is used as reference.

Two frames with similar one-dimensional RMSD are not necessarily similar. Pairwise RMSD better captures the extent of structural diversity across the simulation and avoids the bias from comparing all frames to a single (first) conformer. A few clusters of structures are visible in the RMSD matrices, suggesting that the simulations have converged to a few distinct states (Figure 3). However, there are no very significant variations along the trajectories (RMSD never exceeds ~ 2 Å). The nature of the visited states is investigated for the structure 1 in the sections below.

Figure 3: All-to-all RMSD (all residues and ligand atoms except H) for the fifteen simulations

Figure 4: All-to-all RMSD (all residues and ligand atoms except H) for the microsecond simulation

2.1.3.2 Unrestrained MD

The trajectory is relatively stable across the unrestrained microsecond simulation, although with a larger RMSD (Figure 5, Figure 6).

Figure 5: RMSD on all residues (including 5’ and 3’-ends) without hydrogens across the unrestrained simulation, where the first frame is used as reference.

Figure 6: All-to-all RMSD (all residues and ligand atoms except H) of the unrestrained simulation

2.1.4 PCA

Given the RMSD heatmaps shown above, it is likely that the aptamer visits different conformations. To isolate discrete states from the microsecond simulation, principal component analysis was performed.

Figure 7: Principal component analysis on the coordinates of the restrained microsecond trajectory. Scree plot (A) showing the contribution of each principal component on the total variance and the cumulative variance labelled on selected data points. (B) Score plots along the first two principal components, colored by kmeans clusters. (C) Sum of absolute loadings of residues for the first two first principal components

From the first four principal components, the data is best clustered in three groups. The states closest to the centroids of each cluster are shown below. The largest differences are at the 3’-end region (G36:T37 and G13:A14, different in cluster 3), the T23 chain reversal loop (different in cluster 1), and - to a lower extent - the G25. This correlates well with the root mean squared flucturations measured during the simulation (Figure 8)

Representative structures of the two PCA clusters. Differences are highlighted.

Figure 8: Root mean squared fluctuations (RMSF), highlighting the residues with the largest mobility during the restrained simulation.

2.1.5 Cross-correlation analysis

The correlation of atomic movements was assessed by calculation of the atom-wise cross-correlation matrix, shown in Figure 9 as a heatmap. Negative values indicate that the atoms/residues move in opposite directions.

Here, it is possible to observe a positive correlation of the dopamine binding site residues T18 and T19 H-bonded to the cathecol, A27 and G28 to the ammonium, and the stacked C26, T32 and G33 (see Section 2.5 below).

Figure 9: Dynamical cross-correlation map (DCCM) of the microsecond simulation across atoms (A) or residues (B)

2.1.6 Domain analysis

The identification of geometrically rigid regions of the aptamer in the microsecond simulation was performed with GeoStaS algorithm (Romanowska, Nowiński, and Trylska 2012). In Figure 10, we see that most of the aptamer is rigid, with the exception of the T23 loop and, to a lower extent, the nearby T30 loop.

Figure 10: Atomic movement similarity matrix obtained with the GeoStaS analysis for the microsecond simulation across atoms (A) or residues (B). The calculation was carried out on 1000 evenly-spaced frames.

2.2 Minimized structures

2.2.1 Structures and energies

The fifteen minimized structures of the dopamine aptamer obtained by restrained molecular dynamics are displayed in Figure 11 (mean RMSD across structures: 1.3 \(\pm\) 0.14 Å). The longer sequence (with duplex stem at the bottom) are shown alongside (also minimized after 10 ns rMD ; see Section 2.7).

Figure 11: Fifteen minimized structures obtained by restrained molecular dynamics and minimization (left) and corresponding structures for the longer sequence calculated with and without A38•T37 base pairing. The dopamine ligand is shown in yellow, guanines in tan, thymines in green, adenines in blue and cytosine in purple.

The summary of the simulation energies is given in Table 1.

The inter-states RMSD are given in Table 2

Table 1: Energies

Table 2: Pairwise RMSD between the top 15 structures

2.2.2 PCA

It is apparent that some structures deviate more from others, e.g. 1, 4, and 8 and, to a lower extent, 9,12 and 14. We aimed at identifying clusters of structures to summarize the conformational space explored by the aptamer. Below are shown the results of a principal component analysis based on all atoms except hydrogen atoms.

Figure 12: Principal component analysis on the coordinates of the 15 minimized structures. Scree plot (A) showing the contribution of each principal component on the total variance and the cumulative variance labelled on selected data points, and (B) to (D): score plots along two dimensions for the first three principal components. Clusters determined by kmeans are colored.

Figure 13: Sum of absolute loadings of residues for the first three first principal components

As expected from the analysis of trajectories, the key differences between the clusters are the loop thymine T23, the 3’end, and G25.

Minimized structures colored by clusters

2.2.3 NMR violations

The average violations observed across the 20 minimized structures, with a cut-off of 0.15 Å are presented in Table 3. No violation exceeds a mean error of 0.25 Å. Out of the 10 violations above the cutoff, 8 are present in more than half of the structures.

Dopamine is involved in 0 violation(s).

Table 3: NMR restraint violations (mean error > 0.15 Å)

2.2.4 Base pairs

The base pairs identified in the 20 minimized structures are named, where possible, in Table 4, and further described using three nomenclatures.

The Saenger nomeclature is a list of 28 (plus one added later) base pairs identified by Roman numerals (Saenger 1984; Gesteland, Cech, and Atkins 2006). A canonical Watson-Crick base pair is XIX for GC, and XX for AT.

The Leontis-Westhof classification (LEONTIS and WESTHOF 2001), latter expanded by Lemieux and Major (Lemieux 2002), describes base-pairing by considering whether H-bonds are established on the Watson-Crick, Hoogsteen or sugar edges, and what is the glycosidic orientation (cis/trans: c/t) of the nucleotides. A canonical Watson-Crick base pair would then be cWW. Note that the sugar edge has been defined with RNA in mind, and therefore with 2’-OH that are absent here. This sugar edge is absent for syn glycosidic dihedral angles (see section Section 2.3 below), which is only the case of G31 here.

The DSSR classification is similar to the Leontis-Westhof, as it indicates the cis/trans (c/t) orientation followed by the interacting edges, which can be Watson-Crick (W), major (M; ~Hoogsteen) and minor grooves (m; ~sugar). It additionally incorporate the +/- base ’face indication as in the Saenger nomenclature. Here, a canonical Watson-Crick base pair is called cW-W.

Table 4: Base pairs identified by DSSR in the 15 minimized structures. Abbreviations. +/-: the paired nucleotides have same/opposite faces (as in canonical antiparallel Watson-Crick pairing); WC and W: Watson-Crick pairing/edge; H: Hoogsteen edge; S: sugar edge; M/m: Major/minor groove edges; .: undefined edge; r: reverse; c/t: cis/trans orientation along the glycosidic bond.

Remarkably, the whole structure only contains two canonical Watson-Crick base pairs (C24-G29 and C26-G33; Figure 14 (a)). One GC (C20-G28) and two AT ((T12-A35 and T19-A27) base pairs have rWC configurations (Figure 14 (b)). Note that T19 is bound by the dopamine ligand as well (see Section 2.5).

2.3 Dihedral angles

All residues are in the anti configuration except for G31 in syn (Table 5). G31 forms a trans Watson-Crick base pair with G21 (tW+W; Figure 15).

Table 5: Dihedral χ angles of the 15 minimized structures

Figure 15: Non-canonical base pair between G31 and G21 (tW+W), within a triplet with G29 (tm+m)

Dihedral angles are stable along the simulation (Figure 16). The same is observed during the unrestrained simulation, albeit with more variance (Figure 17).

Figure 16: Dihedral angles of all residues for the rMD, wrapped in [-180;180], along the trajectory (from center to outside of circle)

Figure 17: Dihedral angles of all residues for the unrestrained MD, wrapped in [-180;180], along the trajectory (from center to outside of circle)

2.4 Puckers

Most nucleotides have their sugar adopting a C2’-endo (typical in B DNA) or close to it (C1’-exo, C3’-exo; Table 6, Figure 18). Among those, T23 also visits the C1’-endo pucker during a large portion of the simulation (Figure 20), and can be found in several of the minimized structures; however this is of little significance given that this residue does not form any intra- or inter-molecular contacts.

G25, whose amino group binds to the N7 of G28 (tm-M; Figure 21 (a)), is found in half the structures in C1’-exo or C2’-endo but adopts a variety of puckers in others. During the microsecond simulation, it mostly exist in C1’-exo, and only visits the other configurations for a short period of time.

T18, which is stacked on G17, is tightly bound by the dopamine ligand (Figure 21 (b)) and forms an off-plane tW-M base pair with G33, has a C1’-endo configuration in most conformers, which has been observed for dsDNA around intercalator drugs or when gaining increasing flexibility (A. H. J. Wang et al. 1987; Shen et al. 2011). It is therefore likely that the backbone of T18 adopts a non-canonical conformation to allow dopamine binding. It remains stable during the simulation.

G16, C20, and G34 mostly exist in the C4’-exo/O4’-endo configurations, which is not very common for DNA (Anosova et al. 2015). G16 pairs with the Hoogsteen face of G34 (Figure 21 (c); tW-M) and their puckering is quite unstable during the simulation (Figure 20). C20 is involved in a triplet made of a reverse Watson-Crick base pair with G28 (tW+W) and a single H-bond to T32 (tW-.; Figure 21 (d)).

G17 has an extremely varied pucker depending on the minimized structure (9 unique puckers observed!), and has a remarkably diffuse pucker angle and amplitude during the course of the microsecond simulation. G17 is involved in a non coplanar triplet with the Hoogsteen face of G33 (tW-M; stacked under the dopamine) and T11 (cW-m; Figure 21 (e)). It also binds to G34 (tW-M) and is stacked on G16, whose puckering is also fairly unstable.

A15 is the only nucleotide that mostly adopts a C4’-endo pucker. It forms an asymetric homopurine base pair with A35 (Figure 21 (f); tW-M) , and the sugar pucker is stable during the simulation (Figure 20).

Figure 18: Sugar puckering of the 15 minimized structures. Puckers are presented from most to least frequent.

Table 6: Sugar puckering of the 15 minimized structures. Puckers are presented from most to least frequent.

Figure 19: Sugar puckering calculated with the Altona & Sundaralingam method, wrapped from 0 to 360 for the restrained simulation. The position of the points relative to the center/border of the circles scales with the pucker amplitude (the larger the closer to the border). The circled points represent the median pucker/amplitude. Calculation performed on 5000 evenly spaced frames.

Figure 20: Pucker angles of all residues for the rMD, along the trajectory (from center to outside of circle). Residues with relative standard deviation above 20% are labeled as flexible

(a) Sheared base pair between the sugar edge of G25 (left) and the Hoogsteen edge of G28 (tm-M)

In the unrestrained simulations, almost all nucleotides have their sugar adopting a C2’-endo (typical in B DNA) or close to it (C1’-exo; Figure 22). Only T18 adopts a C4’-exo configuration. The non-canonical puckers observed above are therefore imposed by the restraints. For the specific case of T18, the more RNA-like pucker observed in the unrestrained simulation still differs from the C1’-endo pucker of its restrained counterpart.

Figure 22: Sugar puckering calculated with the Altona & Sundaralingam method, wrapped from 0 to 360 for the unrestrained simulation. The position of the points relative to the center/border of the circles scales with the pucker amplitude (the larger the closer to the border). The circled points represent the median pucker/amplitude.

2.5 Dopamine binding

The dopamine is intercalated between T32 and G33/C16, and forms H-bond contacts with the O4s of T18 and T19 with its two hydroxy groups (O1, O2; Figure 25). The ammonium group (N1 forms two additional H-bonds, with the N3 of of A27 (and O4’; although not always canonically since the bond angle is sometimes below 120°; Figure 24) and the deoxyribose O4’ of G28.

The H-bonds between O1/O2 of the dopamine and T18/T19 are globally stable during the restrained simulation (Figure 23), in particular the short LDP:O2/T18:O4 (Figure 28). The ammonium group (N1) also establishes stable H-bonds with G28 and A27.

In absence of restraints, the H-bond with A27 and G28 are longer and less stable. Conversely, the bond to T19 is shorter and stable. H-bond angles are also significantly much less stable.

Figure 23: Lengths of H-bonds (donor-acceptor distance) involved in dopamine binding during the restrained (top) and unrestrained (bottom) simulations. The lines are the result of local polynomial regression fitting with a span of 0.01. The canonical C26•G33 bp was used as a reference.

Figure 24: Angles of H-bonds (donor-H-acceptor atoms) established by the dopamine ligand during the microsecond simulation, with the canonical C26•G33 bp used as a reference. When multiple interconverting hydrogen atoms are involved, all are considered and each data point reflect the angle formed by the hydrogen most likely to establish an H-bond. The lines are the result of local polynomial regression fitting with a span of 0.01. The angle stays below 140 ° during the simulation in the case of A27:O4’, which is suboptimal to establish an H-bond. Note that the hydrogen atoms interconvert frequently during the simulation; all three atoms were considered and .

T18 is off-plane as it also binds to the Hoogsteen face of G33 stacked ‘under’ the dopamine. G33 is itself base pairing canonically to C26 (see also Figure 14 (a)).

Conversely, T19 is coplanar to the dopamine, and the two form a triplet with A27. T19 and A27 are arranged in a reverse WC configuration (see also Figure 14 (b)).

In absence of NMR restraints, we noticed that the ammonium side-chain flips between configurations ‘above’ or ‘below’ the catechol plane. We performed PCA on the binding pocket (T18:T19-C26:G28-T32:G33 and dopamine) to identify frames representative of the different configurations.

Figure 26: Principal component analysis on the coordinates of the binding pocket in the unrestrained microsecond trajectory. Scree plot (A) showing the contribution of each principal component on the total variance and the cumulative variance labelled on selected data points. (B) Score plots along the first two principal components, colored by kmeans clusters. (C) Sum of absolute loadings of residues for the first two first principal components

Four clusters were identified and the structures closest to their centroids are shown in Figure 27. The main difference if the configuration of the ammonium chain. The ammonium binds to either:

The O4’ and N3 of G33, which is stacked under the dopamine (top-left structure)
the O2 of T32, which is stacked above the dopamine (top-right structure)
the O4’ and N3 of G28 (bottom-left structure)
the O2 of T32 and N3 of G28, in a configuration midway between the two above (bottom-right structure)

The first configuration only accounts for 2.6% of the simulation. In the three latter case, T32 overlaps more with the catechol moiety, which increases \(\pi-\pi\) interactions. The fourth configuration is the more frequent (41.9% of the frames are in this cluster).

Neither of the four configurations follows exactly the restrained one. The ‘above’ configurations are closer to it, but the ammonium binds differently in each three case. It is folded back above the chain (akin to a scorpio) and does not bind to A27. However, beyond these four discrete examples, we evidence below that the ammonium is still able to bind to A27:N3 in around of third of the frames (Figure 28). For reference, it is almost always bound when restrained (Figure 28).

Figure 27: Binding modes in absence of NMR restraints. Four main binding modes have been evidenced by PCA, mostly varying by the configuration of the amino chain.

2.6 H-bond formation and stability

2.6.1 Complex

Besides those from the binding site, the formation of all H-bonds was monitored over the course of the simulation. Below, the list of monitored bonds is based on H-bonds detected in the minimized structures obtained by rMD; therefore Figure 28 ignores by design bonds formed exclusively in the uMD simulation and/or in absence of dopamine. The purpose of Figure 28 is therefore to establish the stability of H-bonds during the rMD simulation and assess whether they ‘survived’ in absence of restraints and dopamine.

The tight binding of dopamine to T18:O4 by the catechol moiety is confirmed. It is still present in absence of restraints but less frequently. The ammonium group almost always bind the sugar of G28 and the N3 of A27. Similarly, the frequency of H-bond formation is decreased in absence of restraints.

The less canonical binding to the sugar of the A27 is also observed, with low frequency but across the whole simulation, in the rMD. It is almost absent in absence of restraints, as seen above.

Conversely to the observations above, the H-bonding of dopamine to T19 is less frequent and tight in the rMD than in the uMD, as expected from bond length analysis above (Figure 23).

Figure 28: Formation of H-bonds over the course of the simulations with restraints (rMD), without restraints (uMD) and unrestrained with no dopamine (no LDP). The list of H-bonds was established on the minimized structures from the rMD simulations, with a maximum donor-acceptor distance of maximum 3.1 Å, keeping those formed in at least 20% of structures. H-bonds in the 5’- and 3’-end regions (residues 11:17 and 33:37() are colored in blue, and those involving the dopamine in pink. Top: each line is a frame (one frame every 0.04 ns); the darker a frame, the shorter and more linear the H-bond (the transparency follows linearly the bond angle/length ratio). Bottom: Cumulative frequencies. The frequency of bond formation is shown for relaxed (3.5 Å and 120°; light colors) and more stringent (3.0 Å and 140°; dark colors) cut-offs. Only inter-residue H-bonds with a ‘stringent’ frequency above 0.10 are shown.

2.6.2 (Pre-folded) free DNA

Generally speaking, the residues in the 5’ and 3’-end regions (in blue in Figure 28) are remarkably well conserved in absence of dopamine, when starting the simulation from the folded structure. This may indicate that the oligonucleotide can prefold into a short stem intermediate before the binding occurs between this stem and the very dynamic loops, after which the latter could finish folding.

PCA on the trajectory yielded three clusters (Figure 29), whose centroid coordinates are shown in Figure 30. The terminus region can be nicely aligned across clusters, but also against the complex, further demonstrating that the stem is very stable during the simulation, and retains the complex structure.

The loop region also maintains the same local folding in all clusters (it can be locally aligned), however its orientation relative to the terminus stem changes dramatically. It can be orthogonal (cluster 1; Figure 30; left), yielding a linear configuration (‘open’), or parallel, forming a more globular structure (‘closed’), closer to that of the complex (cluster 3; Figure 30; right). The second cluster contain intermediate conformations between these two extrema (cluster 2; Figure 30; center).

Figure 29: Principal component analysis on the coordinates of the unbound oligonucleotide in the unrestrained microsecond trajectory. Scree plot (A) showing the contribution of each principal component on the total variance and the cumulative variance labelled on selected data points. (B) Score plots along the first two principal components, colored by kmeans clusters. (C) Sum of absolute loadings of residues for the first two first principal components

Figure 30: Structures of the centroids of each cluster (1 to 3 from left to right) superimposed with the first minimized structure of the complex (grey).

The unbound aptamer spends most of its time in the ‘closed’ (43.8%) and ‘intermediate’ (43.5%) configurations, and only occasionally visits the ‘open’, linear conformation (12.7%) (Figure 31). The ‘closed’ conformation has a mean lifetime of 0.7 ns and a max residency time of 51 ns. The ‘open’ configuration has a lower mean lifetime 0.54 ns, and is never visited more than 24 ns. The so-called ‘intermediate’ conformation is aptly named as it has an even lower mean lifetime 0.5 ns, and visits never exceed 9.2 ns.

Figure 31: Opening/closing of the unbound oligonucleotide during the microsecond simulation. Top: Cluster assignment for each frame. Bottom: Distance between the C1’ atoms of C24 (loop region) and A35 (stem region). The distance measured for each cluster centroid is shown as a dashed line.

2.7 Long aptamer

The long aptamer was studied from starting structure including (doplong8) or devoid of (doplong7) base pairing between A8 and T37. The minimized structures shown in Figure 11 were obtained by minimizing the last frame of 10-ns *restrained* simulations. The dynamics of these systems was explored with 1 microsecond unrestrained simulations.

2.7.1 RMSD

2.7.1.1 10 ns restrained simulations (for minimization)

The RMSD (vs. the first frame) is quickly plateauing when the A38•T37 is pre-formed (and restrained), but never does in absence of this base pair (Figure 32, Figure 33).

Figure 32: RMSD on all residues (including 5’ and 3’-ends) without hydrogens across the 10 ns restrained simulation of the long aptamer, where the first frame is used as reference.

Figure 33: All-to-all RMSD (all residues and ligand atoms except H) for the 10 ns restrained simulation of the long aptamer.

2.7.1.2 Unrestrained microsecond simulations

In absence of restraints, doplong7 settles at RMSD values close to that of the final frame from the restrained 10 ns simulation (Figure 34). The pairwise data also reveals that an event clearly occurs after 56 ns (Figure 35), that is close to the start of the microsecond simulation but past the endpoint of the restrained 10 ns simulation. After this conformational change, the pairwise RMSD remains apparently stable, but other changes still occur (see below).

In presence of the pre-formed A38•T37 (doplong8), the simulation plateaus much faster (Figure 34)., as seen above for the retrained production run, but at about twice the RMSD. A very significant event is visible at 576 ns, resulting in a doubled RMSD vs. the first frame. The RMSD then varies much more significantly than before, with another significant change at 663 ns.

Figure 34: RMSD on all residues (including 5’ and 3’-ends) without hydrogens across the microsecond unrestrained simulation of the long aptamer, where the first frame is used as reference. The arrows point to conformational changes that are not all evident from this metric alone.

Figure 35: All-to-all RMSD (all residues and ligand atoms except H) for the unrestrained microsecond simulation. The white arrows point to significant events. The arrows point to conformational changes that are not all evident from this metric alone.

2.7.2 Base pairing

2.7.2.1 A38•T37 formation

Analysis of base pairing along the unrestrained simulation provides a straightforward explanation for the large conformational change of doplong7: the A8 and T37, initially far apart, rapidly base pair (canonical cW-W). This is accompanied by the formation of a base pair between G9 and the Hoogsteen face of G13. The latter is also rapidly formed in doplong8, wherein the A38•T37 bp is initially formed; however the base pairing is ‘inverted’:

doplong7: tW-M (VII in Saenger nomenclature), with G9•N2-G13:O6 and G9•N1-G13-N7 H-bonds
doplong8: a cW+M (VI in Saenger nomenclature), with G9•N2-G13-N7 and G9•N1-G13:O6 H-bonds

Note that this base pair is fairly unstable in doplong8, but A38•T37 remains formed nonetheless (reminder: there is no restraint here).

Figure 36: Length of potential A38•T37, and G9•G13 H-bonds during the simulation. Note that the G9•G13 pairing differs. The pink arrow shows the onset of basepairing for doplong7.

2.7.2.2 Other events

The above does not explain what happens in the second part of doplong8’s simulation, nor is the only event occuring for doplong7.

doplong7: Two discrete events occur at around 375 and 553 ns
- First, two of the three H-bonds from C24•G29 bonds get disrupted
- Then the third gives up as well. The neighboring G25•G28 base pair is also disrupted at this last onset (Figure 40).
doplong8: Two discrete events as well, at around 576 and 663 ns.
- First occurs the formation of the consecutive A15•A35, G16•G34 and G17•G33 base pairs in lieu of G34•T11, which is the expected base pair from NMR data of the short aptamer.
- Second, the C24•G29 and G25•G28 base pairs are lost, similar to doplong7 (Figure 40).
- Note also the transient flipping out of G9 after these events. The G36 base pair remains in contact, although it now adopts a co-planar configuration.

All of these conformational changes occur within the region of the aptamer present in the smaller version, and are very likely to be the consequence of the absence of retraints in this microsecond simulation; since all lost base pairs are present in the restrained simulations.

Figure 37: H-bond lengths for selected nucleotides undergoing base pair formation/disruption during the unrestrained microsecond simulation. The vertical lines indicate the onset of formation/disruption and correspond to the arrows in the figures above.

2.7.3 PCA

Principal component analysis allows to extract coordinates characteristic of the conformers prior and after the conformational changes.

Figure 38: Principal component analysis on the coordinates of the long aptamer doplong7 (starting without A38•T37 bp) in the unrestrained microsecond trajectory. Scree plot (A) showing the contribution of each principal component on the total variance and the cumulative variance labelled on selected data points. (B) Score plots along the first two principal components, colored by kmeans clusters. (C) Sum of absolute loadings of residues for the first two first principal components

Figure 39: Principal component analysis on the coordinates of the long aptamer doplong8 (starting with A38•T37 bp) in the unrestrained microsecond trajectory. Scree plot (A) showing the contribution of each principal component on the total variance and the cumulative variance labelled on selected data points. (B) Score plots along the first two principal components, colored by kmeans clusters. (C) Sum of absolute loadings of residues for the first two first principal components

**pcadoplong7:** The A38•T37 base pair is initially absent but canonically formed in cluster 2. However, the latter loses the C24•G29 G25•G28 base pairs. The two events are not directly linked as they occur around half a microsecond from one another.

References

Anosova, Irina, Ewa A. Kowal, Matthew R. Dunn, John C. Chaput, Wade D. Van Horn, and Martin Egli. 2015. “The Structural Diversity of Artificial Genetic Polymers.” Nucleic Acids Research 44 (3): 1007–21. https://doi.org/10.1093/nar/gkv1472.

B. J., Grant, Rodrigues A. P. C., ElSawy K. M., McCammon J. A., and Caves L. S. D. 2006. “Bio3D: An r Package for the Comparative Analysis of Protein Structures.” 22.

Berendsen, H. J. C., J. P. M. Postma, W. F. van Gunsteren, A. DiNola, and J. R. Haak. 1984. “Molecular Dynamics with Coupling to an External Bath.” The Journal of Chemical Physics 81 (8): 3684–90. https://doi.org/10.1063/1.448118.

Case, David A., Hasan Metin Aktulga, Kellon Belfon, David S. Cerutti, G. Andrés Cisneros, Vinícius Wilian D. Cruzeiro, Negin Forouzesh, et al. 2023. “AmberTools.” Journal of Chemical Information and Modeling 63 (20): 6183–91. https://doi.org/10.1021/acs.jcim.3c01153.

Charrad, Malika, Nadia Ghazzali, Véronique Boiteau, and Azam Niknafs. 2014. “NbClust: An r Package for Determining the Relevant Number of Clusters in a Data Set” 61. https://www.jstatsoft.org/v61/i06/.

Gesteland, R. F., T. Cech, and J. F. Atkins. 2006. The RNA World: The Nature of Modern RNA Suggests a Prebiotic RNA World. Cold Spring Harbor Monograph Series. Cold Spring Harbor Laboratory Press. https://books.google.fr/books?id=3mREVdXNzFcC.

Götz, Andreas W., Mark J. Williamson, Dong Xu, Duncan Poole, Scott Le Grand, and Ross C. Walker. 2012. “Routine Microsecond Molecular Dynamics Simulations with AMBER on GPUs. 1. Generalized Born.” Journal of Chemical Theory and Computation 8 (5): 1542–55. https://doi.org/10.1021/ct200909j.

Izadi, Saeed, Ramu Anandakrishnan, and Alexey V. Onufriev. 2014. “Building Water Models: A Different Approach.” The Journal of Physical Chemistry Letters 5 (21): 3863–71. https://doi.org/10.1021/jz501780a.

Laskowski, Roman A., and Mark B. Swindells. 2011. “LigPlot+: Multiple LigandProtein Interaction Diagrams for Drug Discovery.” Journal of Chemical Information and Modeling 51 (10): 2778–86. https://doi.org/10.1021/ci200227u.

Le Grand, Scott, Andreas W. Götz, and Ross C. Walker. 2013. “SPFP: Speed Without CompromiseA Mixed Precision Model for GPU Accelerated Molecular Dynamics Simulations.” Computer Physics Communications 184 (2): 374–80. https://doi.org/10.1016/j.cpc.2012.09.022.

Lemieux, S. 2002. “RNA Canonical and Non-Canonical Base Pairing Types: A Recognition Method and Complete Repertoire.” Nucleic Acids Research 30 (19): 4250–63. https://doi.org/10.1093/nar/gkf540.

LEONTIS, NEOCLES B., and ERIC WESTHOF. 2001. “Geometric Nomenclature and Classification of RNA Base Pairs.” RNA 7 (4): 499–512. https://doi.org/10.1017/s1355838201002515.

Li, Zhen, Lin Frank Song, Pengfei Li, and Kenneth M. Merz. 2020. “Systematic Parametrization of Divalent Metal Ions for the OPC3, OPC, TIP3P-FB, and TIP4P-FB Water Models.” Journal of Chemical Theory and Computation 16 (7): 4429–42. https://doi.org/10.1021/acs.jctc.0c00194.

Love, Olivia, Rodrigo Galindo-Murillo, Marie Zgarbová, Jiří Šponer, Petr Jurečka, and Thomas E. Cheatham. 2023. “Assessing the Current State of Amber Force Field Modifications for DNA─2023 Edition.” Journal of Chemical Theory and Computation 19 (13): 4299–4307. https://doi.org/10.1021/acs.jctc.3c00233.

Lu, Xiang-Jun. 2020. “DSSR-Enabled Innovative Schematics of 3D Nucleic Acid Structures with PyMOL.” Nucleic Acids Research, May. https://doi.org/10.1093/nar/gkaa426.

Lu, Xiang-Jun, Harmen J. Bussemaker, and Wilma K. Olson. 2015. “DSSR: An Integrated Software Tool for Dissecting the Spatial Structure of RNA.” Nucleic Acids Research, July, gkv716. https://doi.org/10.1093/nar/gkv716.

Machado, Matías R., and Sergio Pantano. 2020. “Split the Charge Difference in Two! A Rule of Thumb for Adding Proper Amounts of Ions in MD Simulations.” Journal of Chemical Theory and Computation 16 (3): 1367–72. https://doi.org/10.1021/acs.jctc.9b00953.

R Core Team. 2023. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.

Richardson, Neal, Ian Cook, Nic Crane, Dewey Dunnington, Romain François, Jonathan Keane, Dragoș Moldovan-Grünfeld, Jeroen Ooms, Jacob Wujciak-Jens, and Apache Arrow. 2024. Arrow: Integration to ’Apache’ ’Arrow’. https://github.com/apache/arrow/.

Romanowska, Julia, Krzysztof S. Nowiński, and Joanna Trylska. 2012. “Determining Geometrically Stable Domains in Molecular Conformation Sets.” Journal of Chemical Theory and Computation 8 (8): 2588–99. https://doi.org/10.1021/ct300206j.

Saenger, Wolfram. 1984. Principles of Nucleic Acid Structure. Springer New York. https://doi.org/10.1007/978-1-4612-5190-3.

Salomon-Ferrer, Romelia, Andreas W. Götz, Duncan Poole, Scott Le Grand, and Ross C. Walker. 2013. “Routine Microsecond Molecular Dynamics Simulations with AMBER on GPUs. 2. Explicit Solvent Particle Mesh Ewald.” Journal of Chemical Theory and Computation 9 (9): 3878–88. https://doi.org/10.1021/ct400314y.

Schmit, Jeremy D., Nilusha L. Kariyawasam, Vince Needham, and Paul E. Smith. 2018. “SLTCAP: A Simple Method for Calculating the Number of Ions Needed for MD Simulation.” Journal of Chemical Theory and Computation 14 (4): 1823–27. https://doi.org/10.1021/acs.jctc.7b01254.

Schrödinger, LLC. 2021. The PyMOL Molecular Graphics System, Version 2.5. http://www.pymol.org/pymol.

Shen, X., B. Gu, S. A. Che, and F. S. Zhang. 2011. “Solvent Effects on the Conformation of DNA Dodecamer Segment: A Simulation Study.” The Journal of Chemical Physics 135 (3). https://doi.org/10.1063/1.3610549.

Sindhikara, Daniel J., Seonah Kim, Arthur F. Voter, and Adrian E. Roitberg. 2009. “Bad Seeds Sprout Perilous Dynamics: Stochastic Thermostat Induced Trajectory Synchronization in Biomolecules.” Journal of Chemical Theory and Computation 5 (6): 1624–31. https://doi.org/10.1021/ct800573m.

Wang, Andrew H. J., Giovanni Ughetto, Gary J. Quigley, and Alexander Rich. 1987. “Interactions Between an Anthracycline Antibiotic and DNA: Molecular Structure of Daunomycin Complexed to d(CpGpTpApCpG) at 1.2-.ANG. Resolution.” Biochemistry 26 (4): 1152–63. https://doi.org/10.1021/bi00378a025.

Wang, Junmei, Romain M. Wolf, James W. Caldwell, Peter A. Kollman, and David A. Case. 2004. “Development and Testing of a General Amber Force Field.” Journal of Computational Chemistry 25 (9): 1157–74. https://doi.org/10.1002/jcc.20035.

Wickham, Hadley. 2023. “Httr2: Perform HTTP Requests and Process the Responses.” https://CRAN.R-project.org/package=httr2.

Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the Tidyverse” 4: 1686. https://doi.org/10.21105/joss.01686.

Zgarbová, Marie, Jiří Šponer, and Petr Jurečka. 2021. “Z-DNA as a Touchstone for Additive Empirical Force Fields and a Refinement of the Alpha/Gamma DNA Torsions for AMBER.” Journal of Chemical Theory and Computation 17 (10): 6292–6301. https://doi.org/10.1021/acs.jctc.1c00697.

Reuse

CC BY-SA 4.0