Updated 6/16/06
Follow these instructions to get an idea of the interface we are buidling.
Demonstration Video - You can also watch our recently added
instructional video, demonstrating the use of the NoraVis interface. The video is available
at the following links: Hi-Res
Video | Low-Res Video
1) Start the Application
-
Launch Nora Visualization.
(If the link doesn't work, you need to install or update Java on your system.)
-
Say that you trust the nora project when asked to by the Java security feature.
-
The application starts. It reads the metadata of the Emily Dickinson
Collection, and shows a table of documents.
2) Think of a topic of interest and a corresponding classification task
-
(e.g. topic: Erotics in Emily Dickinson; classification: does this document show signs of erotic language?)
3) Read and Classify Some Documents
-
Click on any document. The document is loaded from the web and
appears on the right.
-
You can rate the document for "erotic content" by clicking
on one of the colored buttons at the bottom of the screen, from false to true.
-
Repeat until you think you have created a good training set.
(For demo purposes you can speed up this process by loading an already rated
project and opening it from the nora interface. An example file can be found
at: emily-rated.nora)
-
You now have defined a training set.
4) Run the Analysis/Prediction
-
Go to the top menu bar and select 'Analyis'.
-
Select 'Prediction'.
-
You will be reminded to balance the training set so that there is an
equal amount of documents rated true and false. (Unbalanced training
sets result in lower-quality results)
-
A progress window opens and gives you an idea of what is happening. It
starts by "getting Proxy" and follows on with other log info.
Be patient. There is a "Cancel" button. When it's done the Cancel
button will change to a Close button, and some FAQs appear. You
can close both of these windows when you are ready.
5) Review Results and Continue rating
-
The screen now includes a feature list on the left with the
hot/not-hot words (100 each). Indicators of hot documents are at the top
(with positive scores) and not-hot are at the bottom (with negative scores).
-
All the documents that had not been rated manually now have a color
code representing the predicted rating (in our example: bright purple
for documents predicted to be Hot).
-
If you click on a document the words deemed hot appear brightly colored, while
the not-hot ones appear dark.
-
If you click on a word in the list on the left, a column is added to the list of
documents, showing you with a red mark which documents include that word.
-
While reviewing results and reading the documents, you can add more
manual ratings which update the training set that will be used when
you run the prediction the next time.
and eventually Find Correlations
-
Using the "views" menu you can open the scatterplot view, but
we have not done enough work on that so it is
hard to use. Still you are welcome to explore...
Click on the different tabs on the right, and try to change the
attributes from the x and y axis, or the size of the dots.
Eventually that view might allow you to explore possible correlations
between hotness and metadata/features available at the document
level, and look for correlations and explanations.