Here is a diagram that shows the entities of Noti (Area, Site, Category) and its relationships. Basically, an Area can contain zero, one or many Categories. And a Category must belong to one (and only one) Site.
Components architecture
We're almost done coding the alpha version which consists of:
a logic layer made in php, which communicates with the DB, handles all the heavy logic, and provides the Noti API
the web interface which, in turn, consists of:
templates which define what the skin of the application
processes that receive data from the web interface (which is defined by the templates) and merge it into the system using the Noti API
the crawlers, written in python, which do the work of fetching and parsing the news web sites and talk with the web interface (via XML) to upload the information gathered.