Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Feb;602(7897):414-419.
doi: 10.1038/s41586-021-04301-9. Epub 2022 Feb 16.

Magnetic control of tokamak plasmas through deep reinforcement learning

Affiliations

Magnetic control of tokamak plasmas through deep reinforcement learning

Jonas Degrave et al. Nature. 2022 Feb.

Abstract

Nuclear fusion using magnetic confinement, in particular in the tokamak configuration, is a promising path towards sustainable energy. A core challenge is to shape and maintain a high-temperature plasma within the tokamak vessel. This requires high-dimensional, high-frequency, closed-loop control using magnetic actuator coils, further complicated by the diverse requirements across a wide range of plasma configurations. In this work, we introduce a previously undescribed architecture for tokamak magnetic controller design that autonomously learns to command the full set of control coils. This architecture meets control objectives specified at a high level, at the same time satisfying physical and operational constraints. This approach has unprecedented flexibility and generality in problem specification and yields a notable reduction in design effort to produce new plasma configurations. We successfully produce and control a diverse set of plasma configurations on the Tokamak à Configuration Variable1,2, including elongated, conventional shapes, as well as advanced configurations, such as negative triangularity and 'snowflake' configurations. Our approach achieves accurate tracking of the location, current and shape for these configurations. We also demonstrate sustained 'droplets' on TCV, in which two separate plasmas are maintained simultaneously within the vessel. This represents a notable advance for tokamak feedback control, showing the potential of reinforcement learning to accelerate research in the fusion domain, and is one of the most challenging real-world systems to which reinforcement learning has been applied.

PubMed Disclaimer

Conflict of interest statement

B.T., F.C., F.F., J.B., J.D., M.N., R.H. and T.E. have filed a provisional patent application about the contents of this manuscript. The remaining authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Representation of the components of our controller design architecture.
a, Depiction of the learning loop. The controller sends voltage commands on the basis of the current plasma state and control targets. These data are sent to the replay buffer, which feeds data to the learner to update the policy. b, Our environment interaction loop, consisting of a power supply model, sensing model, environment physical parameter variation and reward computation. c, Our control policy is an MLP with three hidden layers that takes measurements and control targets and outputs voltage commands. df, The interaction of TCV and the real-time-deployed control system implemented using either a conventional controller composed of many subcomponents (f) or our architecture using a single deep neural network to control all 19 coils directly (e). g, A depiction of TCV and the 19 actuated coils. The vessel is 1.5 m high, with minor radius 0.88 m and vessel half-width 0.26 m. h, A cross section of the vessel and plasma, with the important aspects labelled.
Fig. 2
Fig. 2. Fundamental capability demonstration.
Demonstration of plasma current, vertical stability, position and shape control. Top, target shape points with 2 cm radius (blue circles), compared with the post-experiment equilibrium reconstruction (black continuous line in contour plot). Bottom left, target time traces (blue traces) compared with reconstructed observation (orange traces), with the window of diverted plasma marked (green rectangle). Bottom right, picture inside the vessel at 0.6 s showing the diverted plasma with its legs. Source data
Fig. 3
Fig. 3. Control demonstrations.
Control demonstrations obtained during TCV experiments. Target shape points with 2 cm radius (blue circles), compared with the equilibrium reconstruction plasma boundary (black continuous line). In all figures, the first time slice shows the handover condition. a, Elongation of 1.9 with vertical instability growth rate of 1.4 kHz. b, Approximate ITER-proposed shape with neutral beam heating (NBH) entering H-mode. c, Diverted negative triangularity of −0.8. d, Snowflake configuration with a time-varying control of the bottom X-point, where the target X-points are marked in blue. Extended traces for these shots can be found in Extended Data Fig. 2. Source data
Fig. 4
Fig. 4. Droplets.
Demonstration of sustained control of two independent droplets on TCV for the entire 200-ms control window. Left, control of Ip for each independent lobe up to the same target value. Right, a picture in which the two droplets are visible, taken from a camera looking into the vessel at t = 0.55. Source data
Extended Data Fig. 1
Extended Data Fig. 1. Pictures and illustration of the TCV.
a, b Photographs showing the part of the TCV inside the bioshield. c CAD drawing of the vessel and coils of the TCV. d View inside the TCV (Alain Herzog/EPFL), showing the limiter tiling, baffles and central column.
Extended Data Fig. 2
Extended Data Fig. 2. A larger overview of the shots in Fig. 3.
We plotted the reconstructed values for the normalized pressure βp and safety factor qA, along with the range of domain randomization these variables saw during training (in green), which can be found in Extended Data Table 2. We also plot the growth rate, γ, and the plasma current, Ip, along with the associated target value. Where relevant, we plot the elongation κ, the neutral beam heating, the triangularity δ and the vertical position of the bottom X-point ZX and its target. Source data
Extended Data Fig. 3
Extended Data Fig. 3. Control variability.
To illustrate the variability of the performance that our deterministic controller achieves on the environment, we have plotted the trajectories of one policy that was used twice on the plant: in shot 70599 (in blue) and shot 70600 (in orange). The dotted line shows where the cross sections of the vessel are illustrated. The trajectories are shown from the handover at 0.0872 s until 0.65 s after the breakdown, after which, on shot 70600, the neutral beam heating was turned on and the two shots diverge. The green line shows the RMSE distance between the LCFS in the two experiments, providing a direct measure of the shape similarity between the two shots. This illustrates the repeatability of experiments both in shape parameters such as elongation κ and triangularity δ and in the error achieved with respect to the targets in plasma current Ip and the shape of the last closed-flux surface. Source data
Extended Data Fig. 4
Extended Data Fig. 4. Further observations.
a, When asked to stabilize the plasma without further specifications, the agent creates a round shape. The agent is in control from t = 0.45 and changes the shape while trying to attain Ra and Za targets. This discovered behaviour is indeed a good solution, as this round plasma is intrinsically stable with a growth rate γ < 0. b, When not given a reward to have similar current on both ohmic coils, the algorithm tended to use the E coils to obtain the same effect as the OH001 coil. This is indeed possible, as can be seen by the coil positions in Fig. 1g, but causes electromagnetic forces on the machine structures. Therefore, in later shots, a reward was added to keep the current in both ohmic coils close together. c, Voltage requests by the policy to avoid the E3 coil from sticking when crossing 0 A. As can be seen in, for example, Extended Data Fig. 4b, the currents can get stuck on 0 A for low voltage requests, a consequence of how these requests are handled by the power system. As this behaviour was hard to model, we introduced a reward to keep the coil currents away from 0 A. The control policy produces a high voltage request to move through this region quickly. d, An illustration of the difference in cross sections between two different shots, in which the only difference is that the policy on the right was trained with a further reward for avoiding X-points in vacuum. Source data
Extended Data Fig. 5
Extended Data Fig. 5. Training progress.
Episodic reward for the deterministic policy smoothed across 20 episodes with parameter variations enabled, in which 100 means that all objectives are perfectly met.  a comparison of the learning curve for the capability benchmark (as shown in Fig. 2) using our asymmetric actor-critic versus a symmetric actor-critic, in which the critic is using the same real-time-capable feedforward network as the actor. In blue is the performance with the default critic of 718,337 parameters. In orange, we show the symmetric version, in which the critic has the same feedforward structure and size (266,497 parameters) as the policy (266,280 parameters). When we keep the feedforward structure of the symmetric critic and scale up the critic, we find that widening its width to 512 units (in green, 926,209 parameters) or even 1,024 units (in red, 3,425,281 parameters) does not bridge the performance gap with the smaller recurrent critic. b comparison between using various amounts of actors for stabilizing a mildly elongated plasma. Although the policies in this paper were trained with 5,000 actors, this comparison shows that, at least for simpler cases, the same level of performance can be achieved with much lower computational resources. Source data

References

    1. Hofmann F, et al. Creation and control of variably shaped plasmas in TCV. Plasma Phys. Control. Fusion. 1994;36:B277. doi: 10.1088/0741-3335/36/12B/023. - DOI
    1. Coda S, et al. Physics research on the TCV tokamak facility: from conventional to alternative scenarios and beyond. Nucl. Fusion. 2019;59:112023. doi: 10.1088/1741-4326/ab25cb. - DOI
    1. Anand H, Coda S, Felici F, Galperti C, Moret J-M. A novel plasma position and shape controller for advanced configuration development on the TCV tokamak. Nucl. Fusion. 2017;57:126026. doi: 10.1088/1741-4326/aa7f4d. - DOI
    1. Mele A, et al. MIMO shape control at the EAST tokamak: simulations and experiments. Fusion Eng. Des. 2019;146:1282–1285. doi: 10.1016/j.fusengdes.2019.02.058. - DOI
    1. Anand H, et al. Plasma flux expansion control on the DIII-D tokamak. Plasma Phys. Control. Fusion. 2020;63:015006. doi: 10.1088/1361-6587/abc457. - DOI

Publication types

LinkOut - more resources