Software and People

I've been reading some interesting grumbling about the Prevayler project and the concept it espouses of "object prevalence". In particular, Ted Neward and Mike Spille (and the growing posse of commenters).

Rather than join in the comment stream, I thought I'd summarise my views here.

A lot of the arguments seem to bounce back and forth between "here are a bunch of reasons why Prevayler is worse than a database", and "here are a bunch of reasons why Prevayler is better than a database". Along the the way there is a lot of nitpicking about how much memory might be needed by an application, whether you get more "future-expansion" with a database, and so on. In short, a lot of head-bumping.

It seems to me that a wiser route may be to take for granted that there are likely to be situations where a relational database is a good choice, situations where some sort of "prevalent" approach is a good choice, and also situations where other options (plain text files, XML files, object database, JavaSpaces, paper index cards, no persistence at all, etc.) are a good choice. With this in mind, let's look for some situations where "prevalence" might be a useful component of a system.

In any system with a "master copy" and one or more "shadow copies" issues arise about how to keep the shadows in sync with the master. These issues include things like how to transfer information, how the shadows get to know that the master has changed, how up-to-date the shadows need to be, what happens when a new shadow is created, and so on. One way of thinking of "prevalence" is as a system where the in-memory representation is the "master copy". This is in contrast to typical database persistence where the information on the filesystem, managed by the database server, is the "master copy".

Everyone who has created a database-backed application is aware of these issues. Tough choices between cacheing efficiently-retrieved large chunks of data (but risking it getting "stale" if the underlying database changes) and clogging the app/database channel by fetching each data item every time it's needed. Worrying about whether (and how often) to poll the database for changes to cached data. Struggling to reduce response times by tuning queries. And so on. Recommending database solutions for their large capacity and shareability but glossing over these problems is very dangerous, but nonetheless common. When did you last see serious discussion of these kinds of issues on the web site of a database vendor? The same kind of optimism can be found among advocates of "prevalence". Pointing out the speed of fetching data from memory, and the way it gets round some of the problems of databases without stopping to highlight the potential "show-stoppers".

There seems to be a certain amount of (conscious or unconscious) selection of battlegrounds between the "prevalence" and "database" camps, though.

One of the biggest benefits of something like a database is that it can be accessed in many different ways by different applications. In some ways this attribute is ahared by other externally-specified data storage (such as CSV or XML files, JavaSpaces, WebDAV or CVS, and so on.) Data persistence that needs to provide stable access to multiple, unrelated applications naturally suits making the stored data the "master copy". There are obviously many classes of application (such as typical once-a-week or once-a-day business reports) that don't need to concern themselves with the master/shadow issues mentioned above. Another common benefit of a database is the capacity. With the likes of RFID tracking and on-line catalogue browsing generating massive streams of data, holding all of this in a mere few gigabytes of memory would be impossible. Luckily, terabyte disk arrays commonplace and (relatively) cheap, making external storage of such bulk data a natural choice, especially when reading and processing of such "logged" data is relatively infrequent.

So, if your problem implies multiple unrelated accesses, large data volumes, or a large proportion of writes compared with reads, choosing an "external master" system such as a database seems a good choice. Arguing the merits of an "internal master" system such as "prevalence" is unlikely to be successful or worthwhile in these cases.

One of the biggest benefits of "prevalence" is that the master data is private to the application. A "prevalent" system can safely assume that the only changes to its data will come via itself. The code does not have to defend against external applications tinkering with its data while in the middle of operations. "External master" data storage such as databases and flat files is by nature public, and making assumptions that what you put in a while ago is still the same now has caused many a hard-to-find application bug. When an application has complete control over the state of the data it can afford to run fast and loose.

So, for applications which completely manage their own data, don't have huge amounts of data to manage, and have a large proportion of reads compared with writes, an "internal master" system such as "prevalence" is a natural winner.

I would imagine that many hard-core database developers will find it hard to imagine that there are many real-life cases that fit the niche for "prevalent" persistence. Every day they deal with large data repositories, each of which has many different processes accesing it. It's easy to assume this is the great majority of applications. I would like to suggest that this may not be the case. To see why, I suggest (ironically enough) considering the common uses of the very popular MySQL database.

MySQL is installed as standard with most Linux systems, and is the default persistent data storage system for many development languages. When you look at the data stored in these MySQL databases, it becomes apparent that a huge amount of them have at most a few tens or hundreds of kilobytes of information stored in them. MySQL forms the data storage for wikis, blogs, to-do lists, bulletin boards, bug trackers, DNS servers, LDAP servers, message queues, mail delivery servers and an almost infinite array of similar small, self-contained, low-data-volume applications. These applications are not usually developed and maintained by database programmers and DBAs, but I bet the same DBAa and database programmers use them every day.

Typically, MySQL is used as the persistence layer for these applications because it is available and generally reliable. It's somewhere to put some stuff where it can be found again later. However adding MySQL support for an application is tricky, even when it's built in to a development language or environment. Mapping an arbitrary arrangement of data to a clean relational schema is not a trivial task. So there are lots more of the same class of application that either take a faster and simpler (but more risky) approach of accumulating changes in memory and occasionally writing state out to flat files, or take the performance hit of writing a complete updated flat file for every state change.

To me, these are the class of applications that would most benefit from a "prevalent" approach. They gain the fast response and programming flexibility of an in-memory master, and don't hit the problems of parallel unrelated access and large data volumes.

Following from my recent wondering Do libraries want your old books? and longevity and reliability of "lifetime web space", I think I'm now even more worried.

I read the following in Bruce Landon's " landonline":

For the generation now coming of age, Google defines a sort of continental shelf. Whatever is on that shelf is considered accessible. Whatever isn't fades into the murky unfathomable depths. But when we can beam the halogen light into those depths and search them, we'll be reminded that -- whatever online access can or cannot be offered now, and however long it takes to make complex and sensitive adjustments to the copyright system -- the physical books exist, and are available for our use.

This is a fine observation at the moment. Looking into the future, though, what reasons might libraries have to actually keep physical books when "everything" is online and hardly anyone uses their services?. How much are people likely to pay (either in direct cash, or in electing officials on a "keep the libraries, skimp on other stuff" platform) for the theoretical benefit of being able to access original material.

As I have noted in my previous article, many libraries are happily getting rid of "unpopular" books to make space or reduce costs. I can see this process accelerating as more and more of the bulk of scholarship activities moves online. In generally slower-moving areas of study, this effect is probably not much more than a ripple, but in the fast-paced world of IT it is becoming harder and harder to do historical research even going as little as 20 years back. Books, journals and magazines are dumped as "obsolete" as soon as their peak of demand has passed.

Sure, quick access to digitised versions is convenient, but we should always remember that on-line information will simply "go away", if electricity or funding is removed. Remember how the "dejanews" usenet archive was almost lost until Google "bought" it. To read a book you only need light and eyesight, and once you have it in your hand it's hardly going to vanish just because you are late with a subscription or hosting fee.

In one of the comments to a recent article from Sébastien Paquet about Britannica and Wikipedia, I read the following words from Cindy Hoong:

Recently I took old books from my collections to two local libraries. They were delighted with the donations.

I've often thought that it would be nice to donate old books to libraries, but all the libraries I've tried don't want them. In fact, a fair number of the books in my collection came from libraries! All my local libraries seem so stuffed full of books, that they are selling off the older or less popular ones to make space for new books, periodicals, DVDs and so on.

In my youth I cherished the naive idea that the job of a library was to keep books so that they would be available to readers forever. I am always saddened to see old books being cast out from libraries, even though I understand the practicalities.

A while ago, I wrote about the banning of "human research" and some confusions and issues surrounding it. Then last night I was idly watching some food programme on the TV, and I encountered a whole different set of issues.

The premise of the item was interesting enough. One of the show's presenters went through a simulated Christmas day to see the effect it has on the body. Before he started he was give a battery of tests including a MRI scan and swallowing a "capsule-cam" which could relay live video from inside his gut (eww!). Then he hit all the food, drink, and loungng in front of the telly that happens on a typical overindulgent Christmas day. Then he was given further series of tests over the next few days to see how his system was coping. All good fun TV.

But, if "human research" is such a terrible thing in academia, how can it be that it is socially acceptable (and even considered as entertainment) when it's on the box? Surely a a large proportion of what is known as "reality television" these days is little else if not "human research" (albeit not performed very rigorously)?

I was interested to read what John Tangney wrote about his views on Misuse of Java visibility declaration. Unfortunately, some of his statements worry me.

The first part of his article:

The public, protected, and private keywords (as well as "default") visibility in Java are there for a reason. Their purpose is to make the compiler catch mistakes, implement encapsulation and data hiding, and protect sensitive code from unauthorized callers.

seems reasonable, until you get to the bit about "protect sensitive code from unauthorized callers". This seems a common misapprehension, but following its logic can lead to all sorts of design conflicts. I would be happier if it read something like "protect sensitive code and data from inadvertent access". After all, marking something private or protected is really only a suggestion - a determined programmer can always get round the "protection" using reflection. Anyone thinking of these keywords as a security technique is missing the point.

My biggest worry about the article, though, is where John writes:

I am seeing more and more Java code in which methods are blindly, lazily declared as public.

To assume such motives for someone based on something like reading a few public smethods eems a bit naive, expecially from someone who describes himself as an "Extreme Programmer". In my experience, Extreme Programming (and the closely allied techniques of refactoring and test-driven development) thrive where more of the codebase is separately accessible for testing and refactoring. Marking a method "private" severely limits its testability and reusability, and can complicate refactoring by requiring code sections to be extracted to other, public, methods before they can be shared in other contexts or moved to more appropriate classes.

Maybe I am blind or lazy, but I have found my views on visibility changing. In my early Java programming I rigorously followed the approach of "everything is private unless I really need to share it". These days, encouraged by refactoring tools such as Eclipse and IDEA, and by the TDD approach of building only the minimum needed to pass each test, I have moved to a more open "everything is public unless it can break stuff when called independently". This in turn leads to minimising the amount of code that can break in such a way, as the occasional private method really sticks out.

YMMV, of course, but I have found that the more of my code that is safely public, the more malleable and refactorable the code becomes.

It made me smile a little, when I read that Ben Hammersley sent out his review drafts by email.

If ever something seemed ripe for RSS, its distributing drafts to a pool of reviewers. Why clog up their inboxes when a feed can simply let them know that the files are available for secure download, or included as an "enclosure". And the old chestnut of "the reviewers might not use a feed reader" is blatantly daft. If a reviewer of a book called Content Syndication with RSS and Atom, Second Edition can't read a feed, then I'd suggest that he or she shouldn't really be on the team.

Via several sources, including A Writing Teacher's Blog, I read about the case of Steve Geluso who was unhappy with the marks he got from his "LA exit exam", so he posted the original submission and the comments from the markers on his blog for comments and support. I'd not heard of the term, but apparently "LA" stands for "Language Arts". I'm still not sure how an "exit exam" differs from any other exam, though.

From the point of view of the community of bloggers, wading in with opinions, support and criticism, this is not unusual. People complain about stuff on weblogs all the time. From the point of view of the educational establishment and the teachers/assessors involved, however, this is astonishing and outrageous. Based on my own small experience I can see a storm coming. This kind of exposure and public criticicm will not sit well with those used to exercising power from "on high", or those harbouring hidden doubts about the quality and consistency of their teaching and marking. And make no mistake, self-doubt is endemic among teachers. Despite the aura of competency donned like a mask for lessons, teaching is an isolated profession. Most teachers recieve very little feedback on their performance, instead being left to find their own way and mark work based on their own assumptions about the curriculum.

I have found that I am a natural "sharer". I have posted (or am in progress of posting) most of my work as a student over the last few years to a web site to help others in similar situations. I am often surprised by how secretive people can be - given how many students there are, and how much coursework they produce, I would expect the internet to be awash with such "past papers". Instead, such sites seem rare, and often looked on with distrust. I am currently in the slightly unusual situation of being both a student and a member of staff at the college where I teach. Despite making no secret of the existence of my site, I recieved an email last year from the "Dean of Management, Arts and Sciences", stating:

It names {names of tutors} yet they did not give their permission for this material to be disseminated. If permission had been sought, it would have been denied. Plagiarism is a big industry and this web site is giving detailed material out for free.

and

Please could you remove all material gained from your PCGE from the public domain. However, we would be happy to support a more local notice board.

When I followed up this email (which came out of the blue), it transpired that some unnamed person at the college had found my web site and simply assumed that I must be reposting handouts and other material provide by the course tutors. When I responded forcefully by explaining clearly that this is all my own work and research, and that I have never signed anything transferring copyright to the college, to the tutors, or to anyone else, I was allowed to continue. At the time I felt that the approval was somewhat grudging, and that several people viewed this as behaviour unbecoming of a teacher - that publishing my "answers" is some sort of incitement to plagiarism.

One of the things that boggled me, though, was the reluctance of the course tutors to even be mentioned by name (a preference I have, nonetheless, honoured in this article). Surely the names of the tutors for a course is already public knowledge? I had assumed that naming the tutors was equivalent to naming the college or the course. Such is the power of the fear of exposure.

In Steve Geluso's case I can see several things happening soon:

  • He will be told to remove the marker's comments from his web site(s). This is fairly straightforward, although pointless. After all, he doesn't "own" the words, and the originator has some right to refuse publication. It's too late now, though, that "genie is out of the bottle".
  • He might be asked to remove the whole discussion. A possible justification is that it might prejudice some sort of enquiry. In practice, this is most likely to be "damage limitation", though.
  • The educational establishment will start to put measures and policies in place to prevent such an embarassment in the future. This may even go so far as requiring students to sign over copyright of their assessed work or refusing to return work to the creator.
  • He will get a reputation as a troublemaker.

Those of you working in education should keep a very sharp ear out for the aftershocks of this. I strongly believe that each writer owns his words, even when they are created for an assessment task. The imposition of wide-ranging "solutions" to the potential for embarassment of publishing assessed work and marks would be yet another nail in the coffin of sharing, research and open discussion.

Finally, as others have pointed out, the irony of this case is mainly that through the process of writing about his grievances, he has definately shown that he is a comptetent writer, despite having been unsuccessful with the official assessment instrument.

Alan at 'cogdogblog' is apparently in a gloomy mood about RSS. We've all seen the once proud institutions of usenet, email, the web, search technology, wiki and blog feedback exploited and polluted with the likes of spam. Alan worries that the same will happen to RSS.

My take on this is that RSS (and I use the term to include associated feed techologies such as RDF and Atom) may succumb, but that it has things in its favour that the others either did not have, or at least did not use well. And if RSS does collapse, the next collaboration technology might get it right.

First, RSS is a "pull" technology. "Push" technologies (such as email and usenet) allow anyone with access to the stream to pollute it. "Pull" technologies provide the opportunity for the consumer to be the one making the choices. If I decide that I don't like information from a certain source, I simply never bother to fetch information from that source again. In itself, however, just being "pull" is not enough to save a system. The web is "pull" but is still filled with junk which it is very hard to avoid.

Second, RSS is (largely) semantically marked up. Although the original idea of the world wide web was to use semantic markup, it rapidly became the norm to use cosmetic markup instead. Separating desired from undesired content has become a heuristic process, which is easy to exploit and "loophole". Although semantic markup for RSS is still in relatively early stages, it does offer a stronger base for processing and filtering of content.

Third, RSS generally operates on the basis of a "web of respect". I am much more likely to add to my feed reader, feeds that have been recommended on other feeds that I respect. This is currently the main advantage of RSS compared to the web - the web is one single pool of sites and pages, any practical search technology has to search it all. Because the web is a single pool, anyone can pollute it. With RSS it is much simpler to search only those feeds that are part of my "web of respect". Inappropriate content will only be found if one or more of my respected sources is compromised somehow. The down side with this approach, of course, is that it limits the finding of new sources and content. For that, we still need people hunting around the web, usenet, and mailing lists, as well as actually speaking to each other.

Fourth, although currently common practice, allowing external comments on blogs and other feed sources is not essential to the medium. In many ways making a post in your own weblog that references the material you wish to comment on seems more in the weblog "spirit". If everyone "owns their own words", then it is up to the feed reader software to weave material from diverse sources into coherent conversations. In conjuction with the "web of respect" factor, this makes it much easier to only see comments from respected sources.

In conclusion, I feel that RSS is almost there as a robust collaboration technology. It contains aspects that act against most of the obvious failings of previous systems. It is, however, still vulnerable to identity theft, "social engineering" and lower-level network spoofing, as well as the propagation of unhealthy "memes" and untruths.

In general, I love Firefox. It seems solid, behaves well, and handles very-nearly-all pages gracefully. However, there's one thing that's bugging me, and I wonder if there's a simple way round it.

I use several applications that "pop up" separate windows using JavaScript. These are not junk ads but an integral, useful, part of the application. Now, the authors of such applications almost always seem to follow tradition, and remove the normal "window furniture" from these child windows. No buttons, location bar, menu, swtatus-throbber, quick links, etc. I detest this habit.

For regular browser windows I control (via preferences, skins, themes or whatever) how I want my browser to look. For someone to completely ignore and override those settings, just because they have made an erroneous analogy between a browser window and a modal dialog box, really irritates me. In most cases it doesn't even actually help the application by preventing me from doing things like reverting to the previous page (a.k.a "back button"). These operations are still possible, just irritatingly fiddly.

The particular problem that has got me incensed at the moment concerns "tabs". I have found mytself using "open in new tab" from the context menu more and more to take a note of an interesting URL without interrupting my reading of the original page. So it's natural that I also do this on such stripped-down windows. Except that it doesn't seem to work properly. The page gets opened, but I have found no way of actually displaying the phantom tabs it creates. On regular windows, opening a new tab causes the tab bar to magically appear. On a window without buttons and location bar, this never seesm to happen.

I'm open to suggestions. Can anyone offer (a) a way of navigating to tabs that have been opened without the tab bar appearing, or even better (b) a way of configuring Firefox to ignore the window furniture settings on a JavaScript window open call, and always use my own chsen preferences instead?

Thanks for any help.

I've seen some discussion recently about "e-portfolios" and the idea of providing everyone with "lifetime web space". See this educause article and comments on it here, and here, for example. While initially appealing, I have some misgivings about the idea, which I'd like to work through here.

The basic premise of these suggestions is laudable enough - each student to be provided with some sort of online repository, in which to store and publish work, qualifications and research. Such a repository forms a globally accessible indication of their development and abilities. This can be viewed as a next evolutionary step in the "read/write web" after the likes of FTP/HTTP and blogging, replacing the traditional functions of "transcript", "resumé", and "Curriculum Vitae" with a detailled and searchable "e-portfolio". Expanding this idea beyond the walls of existing learning establishments to incorporate all forms of lifelong learning leads to the suggstion of a more generic "lifetime web space" capable of storing, connecting, and searching anything.

However, I can see a few problems with this. In common with the original authors, I'll skip over the technical issues (things like storage provision, authentication and bandwidth) as they are all problems that are actively being persued in the context of existing internet services. I'm more concerned with more general issues.

One major problem is in the word "lifetime". Out here in the "real world", artefacts often persist beyond the death of the creator, and this has immense value. Not just the "big names" such as the drawings of DaVinci or the plays of Shakespeare, but the details of individual lives illuminate the study of history and human society. Any form of "lifetime web space" would really need to be perpetually maintained, to avoid losing this priceless archive of information.

Another problem is how to manage the content in these personal repositories. The web as a whole is filling up with obsolete information, broken links, spam and other junk. I find it hard to believe that any future "lifetime web space" will be any different. Generally, people have neither the time, skills, or motivation to keep such an intangible repository "tidy", especially if it is effectively limitless. People produce an enormous amount of stuff during their lives. Why make hard evaluation choices and laborious categorization or tagging, when you can just upload everything to your "infinite" lifetime web space. Including all the hundreds of cute kiddie scribbles, thousands of letters, notes and memos, millions of digital snapshots, months of video footage and so on. Add to this the issue of referencing external material. If you link to an external resource you can't rely on it in the long term - it might go away, it might change, it might begin to require authorization or payment. The only way to gather a collection of information and artefacts that make sense is to take a copy. Which is bound to lead to massive duplication and turn the lifetime space into even more of a hairball.

A third problem is one of longevity. The oldest web sites in the world are currently about 10 years old. Despite the efforts of the "wayback machine", a large proportion of the content that was once available is simply no longer in existence. Digital data is very easy to lose. Unlike books, paintings, or even the Vindolanda tablets, digital information needs to be actively maintained. If a hard drive stops spinning, the information is unavailable. If you have no reader for a digital format or medium, the information is unavailable. I have some software I once wrote, stored on an eight-inch floppy disk. I haven't seen an eight-inch disk drive in twenty years; I have some video files I can not play, because they were compressed using a hardware codec that is no longer available. We have reached this state in a few short years, and as both the amount of digital information to store, and the size and complexity of storage systems, increase the problem will only become greater. And all this is without a catastrophic accident or terrorist event such as an electromagnetic pulse which "takes out" a large area of the global network. It's not unusual to find a several-hundred-year-old book or painting in an attic, and still be able to make sense of it. To achieve that kind of longevity in a digital archive needs a major, coordinated, pessimistic approach. And that costs.

In conclusion, I can see that the idea of an "e-portfolio", with the specific limited purpose of representing achievement and skills in certain areas, and acting an online extension of the ideas of transcripts, resumés and CVs has merit. This approach is also likely to be achievable. The purpose of representing the owner when applying for work or study opportunities will act as a force to manage, tidy and hone the collection, filtering out inappropriate material, and emphasising the best work. Broadening the scope to include everything researched or produced during a lifetime both massively increases the complexity, and at the same time reduces the pressure to manage the information. I would worry very strongly that such repositories would become write-only virtual junk rooms, fragile, costly to maintain and so poorly organized that any value is hidden and largely inaccessible.

Just had an intriguing discussion with a colleague about marking and states of mind. Apparently he was once "told off" by a non teacher who felt that marking students work in odd moments (such as while waiting for a bus) was doing the students a disservice.

Sometimes, though, the tickle of distraction of needing to be peripherally aware of whether the bus has arrived actually helps achive the ideal egoless state of mind for marking. The task of waiting for the bus can be given to the "I" with all its subjective and personal opinions, while the rest of the full concentration can be given to analysing the work, including the tricky task of looking for omissions and logical errors.

I guess this separation of mental processes is not a trick that everyone can employ. I can't say that I've used it for marking, but many times I have used just such techniques to think through thorny software development problems. I don't do it so much at the moment, but for many years one of my top ways of applying maximum brain power to a problem was to go swimming. The routine of swimming lengths for an hour or so enabled me to hive off the fussy self-critic into keeping track of not breathing water and turning round at the ends, while the rest of my thought processes could focus on the problem at hand.

I've not done any of the repetetive martial arts, but it seems likely that this same approach is part of that, too.

Like many other Wiki operators, I'm having a problem with "wiki spam" - wildly off-topic entries posted solely in the hope of distorting search engine link counts or attracting gullible customers. As the author of a Wiki implementation, I'm also concerned with how we can change the Wiki software to reduce or eliminate this problem.

Three types of solution spring immediately to mind, and I'd be interested to hear of any other suggestions, and also of people's opinions on which (if any) of these options they prefer. The next release of Friki will definately have some form of anti-spam measures, but I would really like to make sure I'm taking the right approach.

My possible solution types so far are:

  • Authentication Things such as requiring a login, or a confirmation email before accepting an edit.
  • Filtering by originator Things such as a "blacklist" or whitelist of IP addresses.
  • Filtering by content Things such as banning posts containing certain URL patterns or phrases.

Suggestions? Opinions?

I was recently talking to a colleague, and the conversation came round to the perceived lack of marketing by the college where I work. One of my pet grumbles is the way that there seems to be no equivalent to the concept of the long-term "customer relationship" saught by many other forms of business. Students browse a prospectus (if they can find it, can be bothered, do it at the right time of year, etc.), sign up for a course if one takes their fancy, then leave at the end. My usual suggestion for this is some way maintaining a public register of suggestions for, and interest in, potential courses, rather than just offering some based on guesswork and hoping we get enough sutdents to make them viable.

This time round, though, the conversation led a different way, toward the idea of lifetime learning as personal development. It suddenly occurred to me that there are other businesses that can be viewed in this manner, but that seem to have found a different business model. If you feel unfit or overweight, and decide that some guided exercise is in order, you sign up for a gym or fitness club.

A gym typically has qualified specialist trainers and supervisors, admin staff and maintenance staff. It has equipment and resources which need to be provisioned, maintained and scheduled so that they are available when the patrons need them. A customer will typically choose a gym based on a referral from another customer, or based on its location or reputation. And above all, it's recognized that you only get out from a gym what you put in - if you are willing to work hard you can achieve great things, but you might just want to go along for a gentle workout and a chat. It's generally obvious that you can get fit on your own if you have the commitment, but it's easier with the input of a trainer and the support of your peers.

This sounds very much like a typical community college to me. The big difference seems to be that the gym makes its money by selling subscriptions - access to some or all of the facilities for a period of time. A college typically registers a student for a particular course, then kicks them out when it's finished. It's well known that lots of people sign up with a gym with good intentions, but then rarely attend. I'm sure that an equivalent number would like the idea of learning and personal development enough to sign up with a college in the same way.

So, is anyone aware of any colleges or universities that offer this sort of subscription service? Can anyone point out any gaping holes in my analogy?

As I was entering another long message on one of the various bulletin boards (BB) I take part in, I came to musing about the differences between the way I take part in a BB community, and the way I take part in a community of weblogs.

Some of the differences are fairly obvious. My BB posts are almost all phrased as "replies" - participations in existing topics of conversation started by others. My blog posts are almost all phrased as starting points for their own conversations or trains of thought.

Some of the more subtle differences, though, are to do with percieved and practical ownership of the words, and the ability to follow the development of one participant's train of thought. Making a BB post implies giving up some some aspects of ownership. None of the BBs I use have the ability to sign up to a RSS feed of my posts. None have the ability for me to "back-up" my own posts in case the BB crashes or goes away. These are basic features for most blog software.

I tend to think of my blog as my own "voice", and take care when creating each article to ensure that, if it is read out of context, it won't misrepresent me or my ideas. BB posts are never without the context of the topic, so I tend to feel less constrained, and more able to choose and reply to specific issues in previous posts in the thread. The "voice" of a BB is the combined voice of all the participants, even when they disagree.

Does anyone know of any BB software that provides the ability to read the postings of a single participant in a more blog-like manner? Being able to download a zip (or whatever) of all one person's messages, and/or a RSS feed for each contributor would be excellent, too.

Suggestions?

Over at Total Perspective Vortex, Matt Cox is musing on the practice of passing in a single Map of named parameters into a method. It's a technique I've used in the past. It's also one that I have recommended from time to time at The Java Ranch. Matt doesn't seem to like the idea much.

In many cases, I agree. The hassle of constructing a HashMap and populating it, just to pass in a bunch of named constant values can certainly complicate the code and leave it wide open to bugs that only appear at runtime. However, I also think that this technique has its uses, and not all of them are immediately obvious.

One thing to bear in mind is that Map is actually an interface. This allows the caller to pass in named collections of parameters with the same API, but different run-time semantics.

Imagine that the parameters you need to pass in to a method are relatively expensive to populate. With a traditional long-string-of-parameters API the caller only has two choices: taking the hit of populating each parameter with real data before the call, or using a shortcut and passing null or a dummy value for the parameters that are "not needed". Unfortunately, for the caller to know that a parameter is "not needed" violates encapsulation and can easily break if the code is refactored or changed. Passing all the parameters is the only safe and future-proof solution, but it can be prohibitvely slow. Catch 22.

So, one solution is to pass in a lazy-populated Map that knows how to fetch these expensive values, but only does so if it is asked. If the code being called doesn't need a particular value in this case, there is no penalty to fetch it. Adding cacheing of expensive values in the Map is usually sensible in this case.

Following on from the same idea, imagine that some of the values are volatile, and can't be known until they are used. They might even give different values when evaluated different times. Again a custom Map implementation comes to our aid. In this case, any cacheing should probably be done in the code that uses the Map.

Imagine again that your application contains a general purpose "context", into which separate parts of the application can store named values for later use by other parts. Sure it sounds like those despised global variables, but this pattern is rife in (for example) the servlet API. Now imagine even further that your application has loaded a set of internal defaults, then "layered" some application configs from an optional config file, then "layered" some user and session preferences over the top of these application-wide settings. Utility code needs to always use whatever setting is "at the top" for each named configuration - this may fall all the way through to the internal default, or it may have been overridden by later settings. Placing code all over the application to check all these repositories and work out which to use is nonsense. So, let's build an object that does it for us. The natural API for this is get(name), which is pretty close to the API for a Map. We can simply and easily pass this Map in to utility methods, knowing that each one can fetch and use any system parameters it needs, and that it will always get the one "on top".

Finally, I've found this technique very useful in applications that dynamically load classes at runtime (for example the user-specified transformation actions used to extend and change the way text is converted to HTML in my Friki Wiki without restarting the server.) In this case I actually use a much lighter-weight equivalent to a Map that I call a Fetcher. A plain Fetcher has just one method Object getObject(String name) which returns a stored object, or null if not found (like a Map). In my StringTree utilities project, I have a whole ecosystem of Fetcher implementations providing things like returning an instance of a named class, or wrapping another Fetcher in a cache, as well as a collection of typed extensions such as StringFetcher which adds a String get(String name) that returns the stored object as a String, or "" if not found.

Defining an API in terms of a single parameter of a very simple type, but with great expressive possibilities allows really creative use of run-time loading to extend the main application, without having to build knowledge of all the future possibilities into the application itself.

Maybe I'm cynical, but I'm astonished that people are reacting at all positively to the announcement of a so-called anti spam screensaver.

The premise, reported uncritically by lots of major news sources including the BBC and The register, is that by downloading and installing a special screensaver, users can join with Lycos in sending massive amounts of traffic to the web sites of "spammers". The list of sites targetted by this proposed worldwide array will be continually managed and updated by Lycos.

In real terms, this means that anyone who installs this software is effectively handing over the keys to their network connection allowing Lycos, an unknown third party, to use it to persecute whomever they please, whenever and as much as they please. And guess what, that's exactly the same goal that the writers and distributers of coutless "DDOS zombie" viruses and worms have been trying for for years.

An array of networked computers with the ability to target and comsume large amounts of bandwidth is plainly and simply a weapon. Even the threat of the use of such a weapon can have a significant effect, opening the possibility of blackmail, extortion, and "protection rackets". To get people to voluntarily join in with this effort, meekly handing over control of such power would be a coup of "social engineering".

Although I personally feel very strongly that this is a crazy idea that nobody in their right mind should subscribe to, I don't mean any slur on Lycos. They are probably acting in good faith, with the genuine aim of threatening only known "spammers". But such a system would have an enormous amount of security problems. Any of the central systems from the ones hosting the lists of "known spammers", to the ones prioritising and randomising potential targets, to the ones managing delivery of target lists to the "zombies" could be the weak link that allows a hacker to gain control of the system. Even claiming the ability to mobilise such a system would be powerful in itself.

And power corrupts...