Realizing a deep reinforcement learning agent for real-time quantum feedback

Kevin Reuer^{1

2}, Jonas Landgraf^{3

4}, Thomas Fösel^{3

4}, James O'Sullivan^{5

6}, Liberto Beltrán^{5

6}, Abdulkadir Akin^{5

6}, Graham J Norris^{5

6}, Ants Remm^{5

6}, Michael Kerschbaum^{5

6}, Jean-Claude Besse^{5

6}, Florian Marquardt^{3

4}, Andreas Wallraff^{5

6}, Christopher Eichler^{7

8}

Affiliations

¹ Department of Physics, ETH Zurich, CH-8093, Zurich, Switzerland. kevin.reuer@phys.ethz.ch.
² Quantum Center, ETH Zurich, CH-8093, Zurich, Switzerland. kevin.reuer@phys.ethz.ch.
³ Max Planck Institute for the Science of Light, Staudtstraße 2, 91058, Erlangen, Germany.
⁴ Physics Department, University of Erlangen-Nuremberg, Staudtstraße 5, 91058, Erlangen, Germany.
⁵ Department of Physics, ETH Zurich, CH-8093, Zurich, Switzerland.
⁶ Quantum Center, ETH Zurich, CH-8093, Zurich, Switzerland.
⁷ Department of Physics, ETH Zurich, CH-8093, Zurich, Switzerland. christopher.eichler@fau.de.
⁸ Physics Department, University of Erlangen-Nuremberg, Staudtstraße 5, 91058, Erlangen, Germany. christopher.eichler@fau.de.

PMID: 37932251
PMCID: PMC10628214
DOI: 10.1038/s41467-023-42901-3

Realizing a deep reinforcement learning agent for real-time quantum feedback

Kevin Reuer et al. Nat Commun. 2023.

. 2023 Nov 6;14(1):7138.

doi: 10.1038/s41467-023-42901-3.

Authors

Affiliations

¹ Department of Physics, ETH Zurich, CH-8093, Zurich, Switzerland. kevin.reuer@phys.ethz.ch.
² Quantum Center, ETH Zurich, CH-8093, Zurich, Switzerland. kevin.reuer@phys.ethz.ch.
³ Max Planck Institute for the Science of Light, Staudtstraße 2, 91058, Erlangen, Germany.
⁴ Physics Department, University of Erlangen-Nuremberg, Staudtstraße 5, 91058, Erlangen, Germany.
⁵ Department of Physics, ETH Zurich, CH-8093, Zurich, Switzerland.
⁶ Quantum Center, ETH Zurich, CH-8093, Zurich, Switzerland.
⁷ Department of Physics, ETH Zurich, CH-8093, Zurich, Switzerland. christopher.eichler@fau.de.
⁸ Physics Department, University of Erlangen-Nuremberg, Staudtstraße 5, 91058, Erlangen, Germany. christopher.eichler@fau.de.

PMID: 37932251
PMCID: PMC10628214
DOI: 10.1038/s41467-023-42901-3

Abstract

Realizing the full potential of quantum technologies requires precise real-time control on time scales much shorter than the coherence time. Model-free reinforcement learning promises to discover efficient feedback strategies from scratch without relying on a description of the quantum system. However, developing and training a reinforcement learning agent able to operate in real-time using feedback has been an open challenge. Here, we have implemented such an agent for a single qubit as a sub-microsecond-latency neural network on a field-programmable gate array (FPGA). We demonstrate its use to efficiently initialize a superconducting qubit and train the agent based solely on measurements. Our work is a first step towards adoption of reinforcement learning for the control of quantum devices and more generally any physical device requiring low-latency feedback.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

**Fig. 1. Concept of the experiment.**
A reinforcement learning (RL) agent, realized as a neural network (NN, red) on a field-programmable gate array (FPGA), receives observations s (Observ., blue trace) from a quantum system, which constitutes the reinforcement learning environment. Here, the quantum system is realized as a transmon qubit coupled to a readout resonator fabricated on a chip (see photograph). The agent processes observations on sub-microsecond timescales to decide in real time on the next action a applied to the quantum system. The update of the agent’s parameters is performed by processing experimentally obtained batches of observations and actions on a PC.

**Fig. 2. Schematic of neural-network-based real-time feedback control.**
a Timing diagram of an experimentally realized reinforcement learning episode. In each cycle j, the observation s ^j resulting from a measurement (Meas., blue) is continuously fed into a neural network (NN, red) which determines the next action a^j (green). The process is terminated after a number of cycles determined by the agent. Then, a verification measurement is performed. b Schematic of the neural network implemented on an FPGA. The neural network consists of fully connected (red lines) layers of feed-forward neurons (red dots) and input neurons (blue dots for observations, green dots for actions). The first layers form the preprocessing network (yellow background). During the evaluation of the low-latency network (blue background), new data points from the signal trace s ^j are fed into the network as they become available. The network outputs the action probabilities for the three actions. Only the execution of the last layer (red background) contributes to the overall latency.

**Fig. 3. Experimental data for reinforcement learning with a network-based real-time agent.**
a Initialization error 1 − P_g and b average number of cycles 〈n〉 until termination vs. number of training episodes N_Train, when preparing an equilibrium state (red squares) and when inverting the population with a π-pulse (dark blue circles) for three independent training runs (solid and transparent points). Each datapoint is obtained from an independent validation data set with ~180,000 episodes. c Probability of choosing an action P(a) vs. the integrated measurement signal V. Actions chosen by the threshold-based strategy are shown as background colors (also for (e)). d Initialization error 1 − P_g vs. average number of cycles 〈n〉 until termination for an equilibrium state for the reinforcement learning agent (red circles) and the threshold-based strategy (black crosses). Stars indicate the strategies used for the experiments in (c) and (e). The dot-dashed black indicates the thermal equilibrium (thermal eq.). Error bars indicate the standard deviation of the fitted initialization error 1 − P_g. e Histogram of the integrated measurement signal V for the initial equilibrium state (blue circles), for the verification measurement (red triangles) and for the measurement in which the agent terminates (green diamonds). Lines are bimodal Gaussian fits, from which we extract ground state populations as shown in the inset. The dashed black line in (d) and (e) indicates the rethermalization (retherm.) limit (see main text).

**Fig. 4. Reinforcement learning results for weak measurements and three-level systems.**
a Probability P(a) of selecting the action indicated in the top left corner vs. the signal of the current V_t and the previous V_t−1 cycle if the agent is permitted to access information from l = 2 previous cycles. The radii of the black circles indicate the standard deviation around the means (black dots) of the fitted bi-modal Gaussian distribution. Black lines are the state discrimination thresholds (normalized to 0, see Supplementary Note 2). P(a) is shown for each bin with at least a single count. Empty bins are colored white (also for (d)). b Initialization error 1 − P_g vs. 〈n〉 for weak measurements for an initially mixed state for l = 2 (red circles), l = 0 (green triangles) of the neural network (NN) and a thresholding strategy (black crosses). c Probability P(a) of choosing the indicated action vs. V and W. Black circles indicate the standard deviation ellipse around the means (black dots) of the fitted tri-modal Gaussian distribution. Black lines are state discrimination thresholds (see Supplementary Note 2). d Initialization error 1 − P_g for a completely mixed qutrit state vs. 〈n〉 when the agent can select to idle, ge-flip and terminate (red circles), and when the agent can in addition perform a gf-flip (blue triangles). The dashed black line in (b) and (d) indicates the rethermalization (retherm.) limit (see main text), the solid black line indicates the thermal equilibrium (thermal eq.). Error bars indicate the standard deviation of the fitted initialization error 1 − P_g.

See this image and copyright information in PMC

References

1. Wiseman, H. & Milburn, G. Quantum Measurement and Control (Cambridge University Press, 2009).
1. Zhang J, Liu Y-X, Wu R-B, Jacobs K, Nori F. Quantum feedback: Theory, experiments, and applications. Phys. Rep. 2017;679:1. doi: 10.1016/j.physrep.2017.02.003. - DOI
1. Ristè D, Bultink CC, Lehnert KW, DiCarlo L. Feedback control of a solid-state qubit using high-fidelity projective measurement. Phys. Rev. Lett. 2012;109:240502. doi: 10.1103/PhysRevLett.109.240502. - DOI - PubMed
1. Campagne-Ibarcq P, et al. Persistent control of a superconducting qubit by stroboscopic measurement feedback. Phys. Rev. X. 2013;3:021008.
1. Salathé Y, et al. Low-latency digital signal processing for feedback and feedforward in quantum computing and communication. Phys. Rev. Appl. 2018;9:034011. doi: 10.1103/PhysRevApplied.9.034011. - DOI

Grants and funding

200021_184686/Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung (Swiss National Science Foundation)

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Realizing a deep reinforcement learning agent for real-time quantum feedback

Affiliations

Realizing a deep reinforcement learning agent for real-time quantum feedback

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Grants and funding

LinkOut - more resources

Full Text Sources