1 Methods
1.1 System preparation
The system was prepared using the leap program from the Amber24 suite (Case et al. 2023). The OL21 and AMBER General Force Field for organic molecules (Version 1.81) were used to describe the complex (J. Wang et al. 2004; Zgarbová, Šponer, and Jurečka 2021; Love et al. 2023). The structure was explicitly solvated in a truncated octahedral box of water molecules, using the OPC model with the ad hoc Li/Merz ion parameters of atomic ions (12-6 set)(Izadi, Anandakrishnan, and Onufriev 2014; Li et al. 2020), with a minimum of 14 Å between the solute and the box edge. For simulations in absence of salts (case of unrestrained simulations), the system was neutralized by adding 25 Na+ counter-cations. In cases where 140 mM NaCl were added to better reflect the SELEX conditions (restrained simulations), the number of required ions (\(N_{\pm}\); here around 35 Na+ and 10 Cl-) was determined following the SLTCAP method (Schmit et al. 2018), using Equation 1 simplified by Machado and Pantano into Equation 2, where \(\nu_w\) is the water volume of the simulation box (around \(3\times 10^5\) Å3 here) in reduced units, \(c_0\) the salt concentration, \(Q\) the total charge of the complex (here -25: 26 phosphates on the aptamer and 1 ammonium group on the dopamine), and \(N_0 = \frac{N_w \times c_0}{55.5}\), with \(N_w\). the number of water molecules in the simulation box (here, around \(9\times10^4\)) (Machado and Pantano 2020).
\[N_{\pm}=\nu_w c_0 e^{\mp ArcSinh(\frac{Q}{2 \nu_w c_0})} \tag{1}\] \[N_{\pm}=N_0\sqrt{1+(\frac{Q}{2N_0})^2 \mp \frac{Q}{2}} \tag{2}\]
Note that the SPLIT method describe by Machado and Pantano cannot be applied as our system does not satisfy the \(N_0 \gg Q\) condition; however it yields identical values in most cases or deviate by a single ion.
1.2 Restrained minimization, heating and molecular dynamics
Using an in-house R script (R Core Team 2023; Wickham et al. 2019), the distance restraints obtained in ARIA were converted to an 8-column format suitable for its processing by the makeDIST_RST function from the sander module of Amber. A custom map file was prepared to define common names for groups of protons sharing a given restraints (e.g. 3 H from a same methyl group). The resulting DISANG restraints file was then applied to all steps below.
All simulation steps were performed with pmemd.cuda (v. 18.0.0) from the CUDA version of AMBER (Götz et al. 2012; Salomon-Ferrer et al. 2013; Le Grand, Götz, and Walker 2013), on an NVIDIA H100 PCIe Tensor core GPU (CUDA version: 12.4) from the DOREMI CALI v3 cluster of the Mésocentre de Calcul Intensif Aquitain (Université de Bordeaux). The system was minimized for 20000 cycles using the steepest descent algorithm for the first 4000 steps and the conjugate gradient for the next 16000 steps. The weights of the restraints were kept constant at 100 kcal.mol-1.Å-2. The system was then heated at constant volume from 0 to 298 K over 18 ps then kept for 2 ps at the final temperature, using a time step of 2 fs, the Langevin thermostat with a 2.0 ps-1 collision frequency and a different seed for the pseudo-random number generation for every run to avoid synchronization artifacts (Sindhikara et al. 2009), an 8 Å non-bonded cutoff, and the bonds involving hydrogen were constrained with the SHAKE algorithm. The system was further equilibrated five times at 298 K with the parameters above and the restraint weights ramping down from 100 to 5 kcal.mol-1.Å-2. The pressure was kept at 1 bar with the Berendsen barostat (Berendsen et al. 1984).
Restrained molecular dynamics for subsequent minimization were ran for 10 ns with the parameters above, and restraints weights set at 20 kcal.mol-1.Å-2. The final coordinates were further minimized, as described above except for the restraint weights set at 20 kcal.mol-1.Å-2. Remaining NMR violations were summarized with the sviol function from the Amber package.
Production MD were run over a microsecond with and without restraints, as well as in absence of dopamine, with the parameters above.
1.3 Data analysis
The bio3d package was used for minimized structure and trajectory files cleanup, alignment, filtering, averaging and analysis (B. J. et al. 2006). The determination of RMSD and RMSF was performed with the rmsd and rmsf functions. Dihedral angles, sugar pucker angles and amplitudes \(\theta_M\) were obtained with in-house R functions leveraging the torsion.xyz function, following Equation 3, where the pucker P is determined by Equation 4 and the sugar torsion angles \(\nu_i\) are defined by four atoms as shown below.
\[ \theta_M = \frac{\nu_2}{\cos P} \tag{3}\]
\[ \tan P = \frac{(\nu_4 + \nu_1)-(\nu_3 + \nu_0)}{2\nu_2(\sin(\frac{\pi}{5}) + \sin(\frac{2\pi}{5}))} \tag{4}\]
\[\nu_0=C4'-O4'-C1'-C2'\] \[\nu_1=O4'-C1'-C2'-C3'\] \[\nu_2=C1'-C2'-C3'-C4'\] \[\nu_3=C2'-C3'-C4'-O4'\] \[\nu_4=C3'-C4'-O4'-C1'\]
All atom-atom and ring-ring distances and angles were measured with the dist.xyz and angle.xyz functions. In-house R scripts were used to infer the formation of H-bonds from these values on full trajectories, by first selecting realistic donor/acceptor atoms for each residue/ligand, then verifying that the donor/acceptor distances and angles were compatible with H-bond formation.
Base pairs were further characterized with DSSR 2.3.2 (Lu, Bussemaker, and Olson 2015; Lu 2020), using the web API through the httr2 R package (Wickham 2023).
Principal component analysis on the final structures and trajectories (all atoms except hydrogen) was performed with the pca function from bio3d. The results were clustered with the k-means method using Euclidean distance and the best number of clusters determined by Nbclust (Charrad et al. 2014). Cross-correlation analysis was carried out with the dccm function of bio3d.
Molecular structures images were created in PyMOL 2.5 (Schrödinger, LLC 2021), interactive molecular structures with mol*, and ligand binding site diagram with LigPlot+ (Laskowski and Swindells 2011). All further data processing and all plotting was performed in R 4.3. Apache Arrow was used to write/read processed data files in feather format (Richardson et al. 2024).
2 Results
Hereafter, are described the results for the restrained simulations over 10 ns for the fifteen starting structures, the first of which was also simulated for a microsecond. As a control, the latter was also performed in absernce of restraints.
2.1 Trajectory
2.1.1 Visualization
2.1.1.1 Microsecond simulation
Below can be seen a comparison between the restrained and unrestrained microsecond simulations. One frame every 10 ns was extracted.
2.1.1.2 Production simulation for minimization
All twenty restrained molecular dynamic trajectories (10 ns) can be visualized below. Only one every 5 frames were extracted. Click on the top-left play button to animate the trajectory.
2.1.2 Average structures
Twenty structures of the microsecond restrained simulation were averaged over 1250 frames of the simulation, without further minimization. It is not incredibly useful as it was not further minimized, but it can be done. The minimized structures can be found in Section 2.2.
2.1.3 RMSD
2.1.3.1 Restrained simulations
The trajectory of almost all fifteen simulations seem to converge after the first hundreds of ps (Figure 1). Only structure 2 requires around 5 ns to converge. Minimization of the coordinates after ten nanoseconds of simulation is appropriate.
The microsecond simulation has a very stable RMSD vs. the first frame.
Two frames with similar one-dimensional RMSD are not necessarily similar. Pairwise RMSD better captures the extent of structural diversity across the simulation and avoids the bias from comparing all frames to a single (first) conformer. A few clusters of structures are visible in the RMSD matrices, suggesting that the simulations have converged to a few distinct states (Figure 3). However, there are no very significant variations along the trajectories (RMSD never exceeds ~ 2 Å). The nature of the visited states is investigated for the structure 1 in the sections below.
2.1.3.2 Unrestrained MD
The trajectory is relatively stable across the unrestrained microsecond simulation, although with a larger RMSD (Figure 5, Figure 6).
2.1.4 PCA
Given the RMSD heatmaps shown above, it is likely that the aptamer visits different conformations. To isolate discrete states from the microsecond simulation, principal component analysis was performed.
From the first four principal components, the data is best clustered in three groups. The states closest to the centroids of each cluster are shown below. The largest differences are at the 3’-end region (G36:T37 and G13:A14, different in cluster 3), the T23 chain reversal loop (different in cluster 1), and - to a lower extent - the G25. This correlates well with the root mean squared flucturations measured during the simulation (Figure 8)
2.1.5 Cross-correlation analysis
The correlation of atomic movements was assessed by calculation of the atom-wise cross-correlation matrix, shown in Figure 9 as a heatmap. Negative values indicate that the atoms/residues move in opposite directions.
Here, it is possible to observe a positive correlation of the dopamine binding site residues T18 and T19 H-bonded to the cathecol, A27 and G28 to the ammonium, and the stacked C26, T32 and G33 (see Section 2.5 below).
2.1.6 Domain analysis
The identification of geometrically rigid regions of the aptamer in the microsecond simulation was performed with GeoStaS algorithm (Romanowska, Nowiński, and Trylska 2012). In Figure 10, we see that most of the aptamer is rigid, with the exception of the T23 loop and, to a lower extent, the nearby T30 loop.
2.2 Minimized structures
2.2.1 Structures and energies
The fifteen minimized structures of the dopamine aptamer obtained by restrained molecular dynamics are displayed in Figure 11 (mean RMSD across structures: 1.3 \(\pm\) 0.14 Å). The longer sequence (with duplex stem at the bottom) are shown alongside (also minimized after 10 ns rMD ; see Section 2.7).
The summary of the simulation energies is given in Table 1.
The inter-states RMSD are given in Table 2
2.2.2 PCA
It is apparent that some structures deviate more from others, e.g. 1, 4, and 8 and, to a lower extent, 9,12 and 14. We aimed at identifying clusters of structures to summarize the conformational space explored by the aptamer. Below are shown the results of a principal component analysis based on all atoms except hydrogen atoms.
As expected from the analysis of trajectories, the key differences between the clusters are the loop thymine T23, the 3’end, and G25.
2.2.3 NMR violations
The average violations observed across the 20 minimized structures, with a cut-off of 0.15 Å are presented in Table 3. No violation exceeds a mean error of 0.25 Å. Out of the 10 violations above the cutoff, 8 are present in more than half of the structures.
Dopamine is involved in 0 violation(s).
2.2.4 Base pairs
The base pairs identified in the 20 minimized structures are named, where possible, in Table 4, and further described using three nomenclatures.
The Saenger nomeclature is a list of 28 (plus one added later) base pairs identified by Roman numerals (Saenger 1984; Gesteland, Cech, and Atkins 2006). A canonical Watson-Crick base pair is XIX for GC, and XX for AT.
The Leontis-Westhof classification (LEONTIS and WESTHOF 2001), latter expanded by Lemieux and Major (Lemieux 2002), describes base-pairing by considering whether H-bonds are established on the Watson-Crick, Hoogsteen or sugar edges, and what is the glycosidic orientation (cis/trans: c/t) of the nucleotides. A canonical Watson-Crick base pair would then be cWW. Note that the sugar edge has been defined with RNA in mind, and therefore with 2’-OH that are absent here. This sugar edge is absent for syn glycosidic dihedral angles (see section Section 2.3 below), which is only the case of G31 here.
The DSSR classification is similar to the Leontis-Westhof, as it indicates the cis/trans (c/t) orientation followed by the interacting edges, which can be Watson-Crick (W), major (M; ~Hoogsteen) and minor grooves (m; ~sugar). It additionally incorporate the +/- base ’face indication as in the Saenger nomenclature. Here, a canonical Watson-Crick base pair is called cW-W.
Remarkably, the whole structure only contains two canonical Watson-Crick base pairs (C24-G29 and C26-G33; Figure 14 (a)). One GC (C20-G28) and two AT ((T12-A35 and T19-A27) base pairs have rWC configurations (Figure 14 (b)). Note that T19 is bound by the dopamine ligand as well (see Section 2.5).
2.3 Dihedral angles
All residues are in the anti configuration except for G31 in syn (Table 5). G31 forms a trans Watson-Crick base pair with G21 (tW+W; Figure 15).
.
Dihedral angles are stable along the simulation (Figure 16). The same is observed during the unrestrained simulation, albeit with more variance (Figure 17).
2.4 Puckers
Most nucleotides have their sugar adopting a C2’-endo (typical in B DNA) or close to it (C1’-exo, C3’-exo; Table 6, Figure 18). Among those, T23 also visits the C1’-endo pucker during a large portion of the simulation (Figure 20), and can be found in several of the minimized structures; however this is of little significance given that this residue does not form any intra- or inter-molecular contacts.
G25, whose amino group binds to the N7 of G28 (tm-M; Figure 21 (a)), is found in half the structures in C1’-exo or C2’-endo but adopts a variety of puckers in others. During the microsecond simulation, it mostly exist in C1’-exo, and only visits the other configurations for a short period of time.
T18, which is stacked on G17, is tightly bound by the dopamine ligand (Figure 21 (b)) and forms an off-plane tW-M base pair with G33, has a C1’-endo configuration in most conformers, which has been observed for dsDNA around intercalator drugs or when gaining increasing flexibility (A. H. J. Wang et al. 1987; Shen et al. 2011). It is therefore likely that the backbone of T18 adopts a non-canonical conformation to allow dopamine binding. It remains stable during the simulation.
G16, C20, and G34 mostly exist in the C4’-exo/O4’-endo configurations, which is not very common for DNA (Anosova et al. 2015). G16 pairs with the Hoogsteen face of G34 (Figure 21 (c); tW-M) and their puckering is quite unstable during the simulation (Figure 20). C20 is involved in a triplet made of a reverse Watson-Crick base pair with G28 (tW+W) and a single H-bond to T32 (tW-.; Figure 21 (d)).
G17 has an extremely varied pucker depending on the minimized structure (9 unique puckers observed!), and has a remarkably diffuse pucker angle and amplitude during the course of the microsecond simulation. G17 is involved in a non coplanar triplet with the Hoogsteen face of G33 (tW-M; stacked under the dopamine) and T11 (cW-m; Figure 21 (e)). It also binds to G34 (tW-M) and is stacked on G16, whose puckering is also fairly unstable.
A15 is the only nucleotide that mostly adopts a C4’-endo pucker. It forms an asymetric homopurine base pair with A35 (Figure 21 (f); tW-M) , and the sugar pucker is stable during the simulation (Figure 20).
In the unrestrained simulations, almost all nucleotides have their sugar adopting a C2’-endo (typical in B DNA) or close to it (C1’-exo; Figure 22). Only T18 adopts a C4’-exo configuration. The non-canonical puckers observed above are therefore imposed by the restraints. For the specific case of T18, the more RNA-like pucker observed in the unrestrained simulation still differs from the C1’-endo pucker of its restrained counterpart.
2.5 Dopamine binding
The dopamine is intercalated between T32 and G33/C16, and forms H-bond contacts with the O4s of T18 and T19 with its two hydroxy groups (O1, O2; Figure 25). The ammonium group (N1 forms two additional H-bonds, with the N3 of of A27 (and O4’; although not always canonically since the bond angle is sometimes below 120°; Figure 24) and the deoxyribose O4’ of G28.
The H-bonds between O1/O2 of the dopamine and T18/T19 are globally stable during the restrained simulation (Figure 23), in particular the short LDP:O2/T18:O4 (Figure 28). The ammonium group (N1) also establishes stable H-bonds with G28 and A27.
In absence of restraints, the H-bond with A27 and G28 are longer and less stable. Conversely, the bond to T19 is shorter and stable. H-bond angles are also significantly much less stable.
T18 is off-plane as it also binds to the Hoogsteen face of G33 stacked ‘under’ the dopamine. G33 is itself base pairing canonically to C26 (see also Figure 14 (a)).
Conversely, T19 is coplanar to the dopamine, and the two form a triplet with A27. T19 and A27 are arranged in a reverse WC configuration (see also Figure 14 (b)).
In absence of NMR restraints, we noticed that the ammonium side-chain flips between configurations ‘above’ or ‘below’ the catechol plane. We performed PCA on the binding pocket (T18:T19-C26:G28-T32:G33 and dopamine) to identify frames representative of the different configurations.
Four clusters were identified and the structures closest to their centroids are shown in Figure 27. The main difference if the configuration of the ammonium chain. The ammonium binds to either:
The O4’ and N3 of G33, which is stacked under the dopamine (top-left structure)
the O2 of T32, which is stacked above the dopamine (top-right structure)
the O4’ and N3 of G28 (bottom-left structure)
the O2 of T32 and N3 of G28, in a configuration midway between the two above (bottom-right structure)
The first configuration only accounts for 2.6% of the simulation. In the three latter case, T32 overlaps more with the catechol moiety, which increases \(\pi-\pi\) interactions. The fourth configuration is the more frequent (41.9% of the frames are in this cluster).
Neither of the four configurations follows exactly the restrained one. The ‘above’ configurations are closer to it, but the ammonium binds differently in each three case. It is folded back above the chain (akin to a scorpio) and does not bind to A27. However, beyond these four discrete examples, we evidence below that the ammonium is still able to bind to A27:N3 in around of third of the frames (Figure 28). For reference, it is almost always bound when restrained (Figure 28).
2.6 H-bond formation and stability
2.6.1 Complex
Besides those from the binding site, the formation of all H-bonds was monitored over the course of the simulation. Below, the list of monitored bonds is based on H-bonds detected in the minimized structures obtained by rMD; therefore Figure 28 ignores by design bonds formed exclusively in the uMD simulation and/or in absence of dopamine. The purpose of Figure 28 is therefore to establish the stability of H-bonds during the rMD simulation and assess whether they ‘survived’ in absence of restraints and dopamine.
The tight binding of dopamine to T18:O4 by the catechol moiety is confirmed. It is still present in absence of restraints but less frequently. The ammonium group almost always bind the sugar of G28 and the N3 of A27. Similarly, the frequency of H-bond formation is decreased in absence of restraints.
The less canonical binding to the sugar of the A27 is also observed, with low frequency but across the whole simulation, in the rMD. It is almost absent in absence of restraints, as seen above.
Conversely to the observations above, the H-bonding of dopamine to T19 is less frequent and tight in the rMD than in the uMD, as expected from bond length analysis above (Figure 23).
2.6.2 (Pre-folded) free DNA
Generally speaking, the residues in the 5’ and 3’-end regions (in blue in Figure 28) are remarkably well conserved in absence of dopamine, when starting the simulation from the folded structure. This may indicate that the oligonucleotide can prefold into a short stem intermediate before the binding occurs between this stem and the very dynamic loops, after which the latter could finish folding.
PCA on the trajectory yielded three clusters (Figure 29), whose centroid coordinates are shown in Figure 30. The terminus region can be nicely aligned across clusters, but also against the complex, further demonstrating that the stem is very stable during the simulation, and retains the complex structure.
The loop region also maintains the same local folding in all clusters (it can be locally aligned), however its orientation relative to the terminus stem changes dramatically. It can be orthogonal (cluster 1; Figure 30; left), yielding a linear configuration (‘open’), or parallel, forming a more globular structure (‘closed’), closer to that of the complex (cluster 3; Figure 30; right). The second cluster contain intermediate conformations between these two extrema (cluster 2; Figure 30; center).
The unbound aptamer spends most of its time in the ‘closed’ (43.8%) and ‘intermediate’ (43.5%) configurations, and only occasionally visits the ‘open’, linear conformation (12.7%) (Figure 31). The ‘closed’ conformation has a mean lifetime of 0.7 ns and a max residency time of 51 ns. The ‘open’ configuration has a lower mean lifetime 0.54 ns, and is never visited more than 24 ns. The so-called ‘intermediate’ conformation is aptly named as it has an even lower mean lifetime 0.5 ns, and visits never exceed 9.2 ns.
2.7 Long aptamer
The long aptamer was studied from starting structure including (doplong8) or devoid of (doplong7) base pairing between A8 and T37. The minimized structures shown in Figure 11 were obtained by minimizing the last frame of 10-ns *restrained* simulations. The dynamics of these systems was explored with 1 microsecond unrestrained simulations.
2.7.1 RMSD
2.7.1.1 10 ns restrained simulations (for minimization)
The RMSD (vs. the first frame) is quickly plateauing when the A38•T37 is pre-formed (and restrained), but never does in absence of this base pair (Figure 32, Figure 33).
2.7.1.2 Unrestrained microsecond simulations
In absence of restraints, doplong7 settles at RMSD values close to that of the final frame from the restrained 10 ns simulation (Figure 34). The pairwise data also reveals that an event clearly occurs after 56 ns (Figure 35), that is close to the start of the microsecond simulation but past the endpoint of the restrained 10 ns simulation. After this conformational change, the pairwise RMSD remains apparently stable, but other changes still occur (see below).
In presence of the pre-formed A38•T37 (doplong8), the simulation plateaus much faster (Figure 34)., as seen above for the retrained production run, but at about twice the RMSD. A very significant event is visible at 576 ns, resulting in a doubled RMSD vs. the first frame. The RMSD then varies much more significantly than before, with another significant change at 663 ns.
2.7.2 Base pairing
2.7.2.1 A38•T37 formation
Analysis of base pairing along the unrestrained simulation provides a straightforward explanation for the large conformational change of doplong7: the A8 and T37, initially far apart, rapidly base pair (canonical cW-W). This is accompanied by the formation of a base pair between G9 and the Hoogsteen face of G13. The latter is also rapidly formed in doplong8, wherein the A38•T37 bp is initially formed; however the base pairing is ‘inverted’:
doplong7: tW-M (VII in Saenger nomenclature), with G9•N2-G13:O6 and G9•N1-G13-N7 H-bonds
doplong8: a cW+M (VI in Saenger nomenclature), with G9•N2-G13-N7 and G9•N1-G13:O6 H-bonds
Note that this base pair is fairly unstable in doplong8, but A38•T37 remains formed nonetheless (reminder: there is no restraint here).
2.7.2.2 Other events
The above does not explain what happens in the second part of doplong8’s simulation, nor is the only event occuring for doplong7.
doplong7: Two discrete events occur at around 375 and 553 ns
First, two of the three H-bonds from C24•G29 bonds get disrupted
Then the third gives up as well. The neighboring G25•G28 base pair is also disrupted at this last onset (Figure 40).
doplong8: Two discrete events as well, at around 576 and 663 ns.
First occurs the formation of the consecutive A15•A35, G16•G34 and G17•G33 base pairs in lieu of G34•T11, which is the expected base pair from NMR data of the short aptamer.
Second, the C24•G29 and G25•G28 base pairs are lost, similar to doplong7 (Figure 40).
Note also the transient flipping out of G9 after these events. The G36 base pair remains in contact, although it now adopts a co-planar configuration.
All of these conformational changes occur within the region of the aptamer present in the smaller version, and are very likely to be the consequence of the absence of retraints in this microsecond simulation; since all lost base pairs are present in the restrained simulations.
2.7.3 PCA
Principal component analysis allows to extract coordinates characteristic of the conformers prior and after the conformational changes.
Superimposed coordinates of the cluster centroids, aligned by their duplex stem. Clusters 1 and 2 and on the left (with blue dopamine) and right (with orange dopamine), respectively.


















































