Total Pageviews

Search This Blog

Wednesday, April 3, 2024

Solving the Bongard Problem

 

In the 1960s, a Soviet computer scientist named Bongard created a list of images. The list was organized into pairs, and each half of the pair contained several images. The images in each half of the pair differed according to some rule. For example, round shapes vs shapes with corners. Humans have an easy time categorizing such things even if they cannot state the rule explicitly. In the above illustration, the rule is that the shapes on the left are all convex and the ones on the right are all concave. The question he posed was: how could a computer recognize the pattern? There is no general algorithm that could be used, and the image recognition problem itself is difficult enough. Any machine that could solve the Bongard problem as well as a human would represent a new milestone in artificial intelligence. Did anybody have the right idea about how to make such a device? 

These are the things Hal Cray pondered as he walked to the institute. It was located on the coast in central California. The area was specifically chosen to be relaxing so as to facilitate the creativity of the scientists and other intellectuals who worked there. It was all funded by a consortium of big tech companies under the auspices of the government. The zeitgeist was fear over the escalating arms race in cyberspace, and so there was a hue and cry to do something. Thus, the institute was founded and became a Mecca for all sorts of eccentric tech types. Once inside the entrance, Hal sequestered himself in an empty conference room. These rooms were known as caves in the building, as the institute was called. It had a vaguely ominous sound. It was common to hear workers ask each other "so where do you work in the building?" The institute was so large and employed so many that no one knew exactly what going on. Kafka in his wildest dreams could not have conceived of such a bureaucracy.

Even so, the powers that be kept paying them, and that kept the workers showing up. A cast of regulars had been showing up at Hal's cave for several weeks. There were only two of them, but for that place, three was definitely a crowd. All of them spent most of their waking hours in the cave. Sometimes they drew on one of the whiteboards, other times they scribbled on paper at the table or just stared off into space. Flowcharts were their common language, though even though they became increasingly elaborate, a solution seemed as elusive as ever. There was a promising lead one day when Hal proposed that the best starting point would be with language recognition. Recognizing language, after all, is an essential feature of any true artificial intelligence. The key was getting the computer to understand the question "how are these two sets of shapes different?" 

Hal explained it to his fellows like so: the simplest thought has an elaborate logical underpinning. For example, the meaning of the word "different" is evident to humans even without a one or more concrete examples. It was time for a neural net, a kind of program which roughly mimics the human brain. To train the net, it would be given a very large number of shape pairs and then would classify them as being either the same kind or a different kind. This sort of fuzzy logic would then be honed through steady refinement. The trio of researchers embarked upon this path, and in a few days, they develop a neural net which could identify whether two objects were similar or not with better than 99% accuracy. This was a major breakthrough and was quickly shared with the rest of the staff. There was already an language learning model which seemed to understand human generated text well enough. The next question was how to pair it with the neural net perceptron. 

As an intermediate step, the language learning model was trained by exercises whereby it was asked to describe shapes. After that, it was then trained identifying a shape based on a description. This work took months of trial and error, but at last, the project was successful, and now the model was far more advanced than its predecessor. The next stage was to give the model verbal descriptions of object pairs and see if it could correctly explain the difference between them. Again, progress was slow, but this too was accomplished in time. The final piece of puzzle was image recognition. This required a brand new neural net which took as input a grid of pixels and returned an object. The team was closing in on a solution to the image recognition problem. In an unsurprising turn of events, it turned out that image recognition required a much more complex neural net because natural language processing was baked into the problem. 

The process from start to finish was as follows: the perceptron scanned the image pair, identified both objects, and returned a word pair. That word pair was fed into the language learning model in the form of a prompt like "how is the circle on the right different from the square on left?" That was an easy question for the model to answer. After some finishing touches, the model was released to the public and it revolutionized every facet of life. It later was compressed into the form of an app and was installed on millions of phones. Hal and his team were honored with a great celebration as they had done most of the heavy lifting on the project. Hal was asked to give a speech, and with characteristic brevity, he dryly observed that it is not necessary to think like a machine, but it is essential to understand how machines think. The great Alan Turing, father of computer science, noted that asking if there will ever be a machine that can think as well as a human is like asking if there will ever be a submarine that can swim as well as a fish. 


No comments: