Monday, April 17, 2006

The interface is all but finished. Kam provided a nice layout of the interface and I modified it a little bit and coded it up using Swing in Java. It is intentionally simple, trying to minimize the amount of knoweldge that someone needs to know about the background stuff before actually using the system. As of now it is simply a search engine into the index of the cached news articles. Lucene has some amazingly powerful query abilities, including both boolean operators and wildcards.



The hooks are there for Kam's interesting phrases. For now, the list of files on the left is the result of the query to the index. The pane in the lower right is where the article is displayed. There is an attempt made to pull out as much of the article by itself as possible. Mostly this is because displaying the entire page in a JEditorPane makes it incredibly ugly because of all the fancy things the news sites do. There is considerable noise (generally at the top and bottom) of the articles, sometimes extra links or advertisements, but the point is not to make it perfect but instead to direct the user to the article. It would be possible to pull out just the text but would require alot more work in determining how each news site sets up its HTML code.

Because the article pane is displaying HTML it is easy to add in more tags to highlight different words. I made a function that highlights a passed in phrase with a passed in color that highlights both query terms and will highlight the interesting phrases themselves if they appear in each article as they are browsed.

0 Comments:

Post a Comment

<< Home