I’m tarting up arts & ego to include semantic metadata. This will allow data hoovering robots and the like to understand a little more information about the content, so making it easier for things such as search engines to reference this site when relevant.
I’m doing so using, amongst other things, the schema.org definitions. They are supported by a number of larger organisations, including Google, Microsoft, Yahoo and Yandex. It’s a complex and detailed specification of what information should be applied when, how, clearly intended to cover most kinds of knowledge in some depth.
I do have something of a doubt that all human knowledge can be easily categorised and put into predefined pigeon holes, which makes me think the schema.org system is merely an early step towards properly supporting metadata on the web, but since it’s supported by a number of big search engines, and by implication reflects the current state of search engine technology, I’ll put aside my doubts and work with it.
Most of the records and fields make sense to me, although it’s quite obvious that their arts sections wasn’t put together by consulting people who live for the arts. For example, they don’t seem to have grasped literature, particularly poetry, except for the concept of articles in periodicals. This is a nuisance, because they’re in effect saying literature is irrelevant on the internet, which of course it isn’t. Still, that kind of thing is repairable with future extensions.
The schema.org metadata includes lots of fields specifying various kinds of detailed information, such as names, dates, places, etc.. Most are obvious, much can be ignored or applied as necessary. Some are confusing, especially when the sponsors of the schema system seem to directly contradict it, as happens with google’s schema test utility (there are alternatives), which doesn’t grasp linked definitions and alternative definition types.
One particular field, though, is causing me some philosophical grief. It’s called isFamilyFriendly. You can give it a value of yes or no. I read that as marking whether content is suitable for viewing by children. I can see there might be minefields there, which probably explains why the meaning of the field is carefully not defined. Minefields? Let’s step in ….
The difficulty is that what is family friendly in one place most decidedly is not in another. Picture of ladies who’ve forgotten to put their clothes on will, in most cases, be marked no, but what about picture of ladies who’ve forgotten to put their facemasks on? In some cultures, that should be marked no too. In parts of the USA, semi–naked people would be marked no, but in parts of Europe, they’re on advertising hoardings, so should be marked yes. On the other hand, pictures of people enjoying playing with automatic weaponry would be acceptable in most of the USA, but would be over the boundary in parts of Europe.
Clearly, what is family friendly is culture dependent. Having a choice of yes or no for such a field is less than useless. If I use it to mark photographs of forgetful women as not being family friendly, does that mean that photographs I do not mark are family friendly? I might mark some other photos as being family friendly, such as photos of children in bars (which is normal in most of Europe), but that would contradict the norm in many other countries, including the USA. If I use it anywhere on this site, I am saying that I have considered, and page x is definitely ok, but page y is a no-no, and page z is unknown, when I can say no such thing because what is acceptable entirely depends on where and when the viewer lives.
This site contains photographs of forgetful ladies. These photos are, amongst other things, exercises in photography. They are records of people enjoying and expressing their humanity. These are records of confident women and confident men. They are erotica. I would, if judging them purely from my own cultural perspective, mark all but the naked images as family friendly. However, I know that, in many other cultures, they would not be regarded so. That’s why, in fact, I’ve protected that entire section of photos with questions about the viewer’s age.
It is not possible to use this isFamilyFriendly field to provide correct information, because the answer depends on where I, as the material’s creator, am coming from and where the viewer is coming from. Users will often trust the value of the field, because it shows the creator, that’s me, has considered the question. Unfortunately, using it implies that I know the viewer’s cultural context. I do not, and cannot, and can never, know that. Using a simple yes/no value for something this complex is worse than useless, it is worse than ignorant, it is brain dead. The only course of action is to not use the field. So I do not.
But if I’m going to respect other peoples’ cultures, especially parents’ fears for their children, ignoring the question isn’t good enough. Simple courtesy, at the very least, requires a notice of some kind. An alternative approach is needed.
Most countries have censorship boards that specify who is and who is not allowed to watch what. The standards applied by these boards differ between countries, supporting my point about cultures. They also differ between times. Unfortunately, it is insufficient to replace isFamilyFriendly by a censorship rating and board identifier, because that would require the viewing software to know all censorship ratings for all countries, over time. It would also presume a crude one–size–fits–all per country solution is sufficient. Also, of course, it suggests that I accept the need for censorship and banning things, rather than simply respecting peoples’ choice in different cultures with different perspectives.
If a solution to this question is worth finding, then a more appropriate direction might be to consider an anthropological approach. One could identify a list of subjects that are sensitive somewhere in the world, and mark any material that covers any of those subjects appropriately. That way the viewer is able to specify what they find offensive, software can compare their preferences with the content, give any warnings that might be appropriate, and the viewer can then decide, based on those warnings, whether or not to proceed. Parents could preset responses to such warnings to protect their children.
And let me make it clear, there would be situations where I would ignore the warnings of a browser. For example, I detest bigotry and appreciate courtesy, so I would normally avoid expressions of racism & rants about the awfulness of courtesy (the term often used by the ranters for courtesy is political correctness). However, as I write, the US political parties are selecting candidates for the forthcoming US presidential election. I suspect almost everything from the leading Republican candidate, a mister Hermann Trumpton, would be marked as offensive. Worse, he’d be marked as childish, yet he’s not the kind of example I’d want children to follow. However, it is always necessary to have one’s opinions challenged, just as it is useful to understand what’s going on in the world, so I would read some content about Hermann Trump despite browser warnings.
Having said all this, I think the problem will eventually resolve itself. Computer software is getting better at analysing text. There have long since being programs that can produce a precis of an article, as readers of popular newspapers may already unwittingly know. The thing that’s missing in software now is an understanding of the semantics and the context. Software can process the word ‘cat’, for example, but without context it will not know that a cat is a jazz fan, and will have no concept of jazz, let alone the feel the natural human reaction to music. Artificial intelligence is developing at speed, and semantic intelligence is a known problem that’s being researched. I am confident it will be resolved, although not immediately: one issue is simply a matter of scale. I believe that in a number of years time there will be no point in marking up a web site to identify what might be of concern to other people in other cultures, especially over something that a web site creator might consider quite innocent: an AI bot could do the work directly from the text, without schema mark–up.
I rather suspect isFamilyFriendly was hoisted on schema.org by one or more of its sponsors. Its failure to consider basic questions that any half decent analysis should have raised suggests to me that its inclusion is for reasons other than the purpose of schema.org itself. That’s why I’d guess a sponsor threatened to throw its toys of the pram if isFamilyFriendly wasn’t included as is. Of course, this hint of corruption (e.g. forced poor decisions in for money) raises the worrying suggestion that other, more subtle, corruption might have skewed the structure significantly, reducing its utility. However, the presence of such big industrial names in the list of funders means there’s nothing an individual like me can do about it without an almighty fight, and this isn’t worth an almighty fight. Anyway, if there is other corruption, I can’t see it.
So I’m going to mark up my website with the schema.org fields, but will ignore the almighty mess that is isFamilyFriendly. I think there is a case for adding schema to achieve what isFamilyFriendly fails to achieve, to build in awareness of different sensitivities to content, but it will be a lot of work and research to define that correctly, so I doubt it will happen.