Here are some images showing dominant orientations.
| image 1 |
| image 2 |
| image 2 |
Very low-level. As a human, I can disambiguate the nearby road, three or four horizons and mountain slopes (?) and the road apparently curving to the right in the distance. Can you try doing that much algorithmically?
Sure, but the degree to which one can reason symbolically about these features depends on knowing the method by which they were produced, which is not suppliedhere. But, I'd like to change the direction of this question. It should not be CBR vs. "some other method" for analyzing such features. We can use CBR as type of reasoning about intermediate level features such as these, but more importantly we can reason, in terms of the past, about higher level features and the analysis systems that derived them. I think where vision is concerned it is precisely the place of vision techniques to analzye the scenes and yield high level knowledge. It is the place of the CBR system to take this knowledge (and low/high level features from other sensors) and consider relevance of past knowledge that may not have seemed relevant to the vision system or the designers of that system at design time.
I'll stop being abstract for one moment...A great example situation is one Doug was using yesterday...CMU's car hit a concrete pillar which it kept pushing until it pushed the pillar over, but this did significant damage to the vehicle. The vehicle knew of no other possibility than the output its command unit gave it: to keep going forward. A CBR unit might find in its case base a past case of getting stuck in the mud a month earlier. This past case has sufficiently similar features to the current situation to consider the solution to the past case, to back the vehicle up, as a possible solution here. A vision system w/o CBR might not be designed/trained to consider relevance of a situation where there is a large object in front of the vehicle to a past situation where there is there is not such object. The problem solving technique of CBR is specifically a layer for understanding such relationships and adapting this knowledge to the current situation.
A possible benefit, other than providing alternative corrective solutions, the benefit I'm really interested in, is CBR as an aid to us, the designers of the system, over the course of the next year as the vehicle performs and its performance is analyzed. This vehicle is bound to have many streams of sensor data and higher level features, many models and black-box algorithms. CBR is sometimes combined with such complex systems as a method of explaning the decisions the system is making to humans (explanation in terms of the past, and difference from the present to the past). I'll save the rest of this for another thread sometime...
To return to where we started, Alex's features could be useful (if the algorithms used were also provided). But more useful from the CBR standpoint would be to have the scene, these intermediate level features, the associated algorithms, and the high level output of a vision system Alex builds to understand the scene. The CBR system could then use this knowledge (along with other sensors and analyzers) to find relevant past experiences and provide this knowledge or some view of this knowledge to the system and to us.
In the short term, if we were provided with scenes and ranges of classificationsand high level features that will be output from the systems the vision group builds (as well as other sensors/analysers)...this would be very helpful for designing a CBR system.
What you have in the image right now is a bunch of unrelated primitive elements (short lines representing "dominant orientations," whatever that is). In themselves, they tell you next to nothing about the logical content of what you are looking at.
Next step would be to group these elements into somewhat more macroscopic features (something that is usually called "segmentation" in vision jargon), i.e. aggregates of pixels that are amenable to labeling and interpretation. I suspect that a lot of that is hard-wired in animal vision, and in computer vision it leads to ad-hoc criteria as to what constitutes an edge, a blob, a region etc.
Next you need criteria for interpreting these features. Does a line across the image represent the horizon? Do the two converging lines represent the nearby road? This is where vision merges with the cognitive process, and it could be either hard-wired or learned, for our purposes.
In my previous message, I tried to illustrate the second and third step. Beyond that, you have an essentially cognitive process of dealing with the given situation/context/whatever. This is where I assume that something like CBR would come into play.
Suppose I refine the question to:
I agree [with Todd]. That's why low-level features like these are useless unless we "organize" them in higher-level features. And that's a difficult vision problem. I agree. That's why low-level features like these are useless unless we "organize" them in higher-level features. And that's a difficult vision problem. [Referring to Todd's comment about training criteria] We may not have enough data for training. So a set of heurisitics may be more effective for recognition.
I do not in any way disagree with Todd's perspective on the most promising application of CBR. But I also think it (or other learning systems) have potential for deriving abstract features from primitive ones, and at other points.
I do not want someone to waste their time working on something that they don't believe will pan out, but at this time I think it is valuable to build system prototypes it it can be done relatively quickly, so that more people can get involved in configuring systems.
Concerning the comments of several who doubt that we have enough information to initiate training, I think we should look at interactive training. For instance, Danko said he could make out three higher features in the picture(s) Alex provided. Could he not just tell a learning system what they are?
Danko mentioned classical mid-level features - blobs, curves, etc. Note that what Alex has plotted are not yet curves (not even lines), but are simply dominant orientations at each pixel.
Organizing these primitive features into mid-level features is a difficult problem which is solved usually by minimizing a measure of error (discrepancy) between the model of those mid-level features and the existing primitives. Basically, it entails a search in the space of the free parameters of the mid-level features that will result in a good fit to the data. For example, the classical Hough transform finds straight lines in the image by doing a search in the space of all possible lines.
It would certainly be interesting if we could get a learning system do this work for us.