I was briefly thinking about the visual invariance problem, which is the question of how does the brain objects even when the same single object (a banana) looks very different from differing angles of view e.g. end on Vs side on.
My thought is somewhat separate which is how does the brain recognize 2D images? for example, you can if shown a picture of a banana (never having seen on before) and be told what it is, then go on to recognize a real one easily enough, but on pure based on visual data a 2D image has very different information than a 3D image yet the brain copes why? This matters too to those of us who run physiological experiment by showing 2D images if these are being processed differently for real 3D images then were not quite asking the question we though we were :).
My suspicion is that what the brain is doing is when ever it is sees a 2D image is that it is constructing a 3D image from it, so when you see the image of banana you make a 3D model of one which is (somehow) stored appropriately in memory and can then have the usual recognition methods applied to it to deal with problems such as visual invariance. This would after all make an evolutionary sort of sense as the brain evolved not to deal with pictures (which only existed incredibly recently) but to deal with real 3D objects.
.
.
.
Edit: went away and had a cup of tea… sorted it out now. It occurred to me that in reality the brain never ever actually sees a 3D world, the eyes report data in a 2D form as that is all they can do by pure observation of the world. The brain then constructs a 3D world view from that 2D scene by the use of a large number of processes and mechanisms (e.g. shadow depth), and then uses that 3D model internally. So what is really happening in the brain when you see a 2D object is that it automatically deploys this same series of mechanisms to turn things into a 3D object. So i believe that for the brain (post visual processing) that this IS no such thing as a 2D object everything is processed into 3D automatically, thus solving the problem.
Edit 2: post another cup of tea (never under estimate the powers of tea), this ought to be experimentally testable, you should for example be able to fool the recognition system, by presenting an image of a novel object (say a banana) but with false perspective ques, which would make it say fatter than expected and the compare people perceptions of the different between their expectations and the real object.
Of course a problem to still to deal with here is that we can also recognise an object from someone giving as a verbal description and then us presumably constructing a 3D model from that, quite how a brain that has only faced this problem in the last 100,000 years or so (tiny in evolutionary terms) is able to cope with this is unknown. But is defiantly one of the reasons I remain fascinated with it!