D-Wave’s quantum annealers can only work with problems formulated as collections of binary variables. That is, values that can only be 0 or 1. Every aspect of your problem must be translated into this framework. If you’re scheduling nurses, each potential assignment becomes a binary variable: “Nurse Sarah works Monday morning shift” is either true (1) or false (0). If you are routing delivery trucks, “Truck goes from Location A to Location B” is either yes (1) or no (1).
This seems simple enough, but the requirement runs deeper. Not only must your variables be binary, but the relationships between variables and your optimization objective must be expressible as an energy function over these binary states. Essentially, you need to map your entire problem onto a mathematical structure that looks like a system of interacting magnets, where each magnet can point up or down, and you’re searching for the lowest-energy configuration.
Problems That Don’t Naturally Fit
Many important computational problems simply don’t work this way. Modern neural network training operates on continuous gradients—small adjustments to weights that can take any value. You can’t easily represent “adjust this weight by 0.0037” as a binary choice. Large language models process sequences of text through continuous vector representations and attention mechanisms that involve matrix multiplication and floating-point arithmetic throughout. There’s no natural mapping to binary variables.
Consider the example of image processing. A photograph contains millions of pixels, each with continuous color values. While you could theoretically encode these into binary representations, you’d need many binary variables per pixel (perhaps 24 bits to represent RGB color depth), and the operations you want to perform (convolution, pooling, or normalization) don’t align with the energy minimization framework D-Wave provides.
Even some standard optimization problems resist this format, and are therefore difficult to program. If you’re optimizing a chemical process where temperature, pressure, and flow rate can vary continuously within ranges, forcing these into discrete binary choices means either losing precision (maybe you can only set the temperature to 200°C or 250°C, nothing in between) or creating an explosion of binary variables to simulate continuity (using many bits to represent a range, which quickly exceeds QPU hardware capacity).
Why the Binary Constraint Limitation Matters for AI
This constraint is why D-Wave’s technology remains (and will likely always be) peripheral to modern AI development. The breakthrough architectures (transformers for language, convolutional networks for vision, or reinforcement learning for decision-making) all rely on continuous optimization in high-dimensional spaces. Training involves calculating gradients, updating millions or billions of parameters with small continuous adjustments, and processing data through layers of non-linear transformations.
The Japan Tobacco drug discovery project is illustrative. They didn’t replace their transformer architecture or their gradient-based training. They used D-Wave for one specific component (the RBM) that naturally fits binary structure and probabilistic sampling. The rest of the system (the part doing the actual heavy lifting of learning molecular patterns and generating candidates) runs on conventional hardware using conventional algorithms.
For D-Wave to become relevant to mainstream AI, either AI architectures would need to fundamentally shift toward discrete, binary-structured approaches (which seems unlikely given current trends), or D-Wave would need to develop quantum systems that handle continuous optimization and gradient-based learning (which would require gate logic instead of annealing). The binary constraint isn’t just a minor technical detail. It represents a fundamental mismatch between what quantum annealing does well, and what modern AI requires.