[vos-d] multiple languages

Karsten Otto otto at inf.fu-berlin.de
Wed Apr 26 05:46:44 EDT 2006


Hello,

well, my actual point was that actions should be machine- 
understandable strings, which natural language terms are not.

Language tagging vor property values in general is a different  
problem, but anyway:
I think there is a rather general solution in the way XML (or RDF)  
handles this, i.e. every element/node and every subelement/literal  
may have an xml:lang tag. The tag does not change the semantics of  
the element/property though: <dc:title xml:lang="en">... and  
<dc:tilte xml:lang="de">... both specify "the" title, with no  
preference of one *form* over the other.

In practice, I have the impression that all this language tagging is  
rarely used, people just don't seem to care about multilingual  
documents. Also note that language tagging is really nasty when it  
comes to changing values: I may be able to change the English and  
German title, but I am at a loss when it comes to the Hindi and  
Cantonese one. In this case the result is not only a language  
problem, but a semantic mismatch problem too: some properties have  
the "current" tilte and some still retain the "old" one.

Now if you still want to go for it, IMHO the cleanest solution would  
be the addition of a separate member in property, separate from  
metatype, actual value, and MIME type.
In most cases this one will be just a NULL pointer, i.e. no value,  
but for a tagged property it contains the language code (RFC 3066, en- 
us, cn, de, like that). This is close to (2), with all its advantages  
and disadvantages.

I would not mash this in with the MIME type however, as it is really  
something entirely different. Same for the metatype (1), the language  
does not really change the type, only qualifies it. Finally, (3) does  
not tag/qualify properties with equivalent meaning, but expresses  
*translations* of properties. While this is interesting, it is not  
what we want.

Regards,
Karsten Otto (kao)


Am 26.04.2006 um 02:34 schrieb Reed Hedges:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Karsten Otto wrote:
>> The point here is that the action *must not* be a simple string.   
>> While
>> an English speaking human may understand the string "open",   
>> someone who
>> only speaks Chinese will not understand it. Same with  your average
>> agent, who does not speak *any* language at all :-)
>>
>
> You actually bring up a very interesting problem that I would  
> *love* to
> find a general solution for, before there's too much existing users of
> VOS -- supporting multiple translations in human-oriented metadata.
> This would probably be part of the MetaData vobject properties, but
> couldf also be used anywhere there is text for humans to read.
>
> I have three ideas on it:
>
> 1. Have seperate properties, with the language indicated in the  
> property
> name. E.g. misc:title.en-us, misc:title.de, misc:title.cn.  If your
> language isn't present, you would use the first one in the list. You
> could always also omit the language, and just say misc:title; some
> metadata strings would not really benefit much from translation in  
> some
> cases.
>
> 2. Have seperate properties with the same name, but indicate the
> language in the property datatype.  The definition of a property
> datatype leaves room for this in that MIME types can have  
> "parameters".
> E.g. misc:title with data type "text/plain(lang=en/US)", misc:title  
> with
> data type "text/html(lang=de).
>
> The disadvantage of (2) is you have to iterate over the properties
> before you find your language.  Peter is proposing including the
> datatype in with the property vobject type; if you could do a site
> search query on vobject types, you could search for your language.  
> (But
> that forces you to do a search query which is not required for a  
> site to
> support; you'd have to fall back on iterating all properties if search
> is not supported.)
>
> 3. Another idea is to allow a metadata property to have children with
> translations.  I kind of like this. It lets you know what the original
> or native property value is (the parent property's value), and you can
> craft search queries not to descend it if you don't care about
> translations.  You would probably then just name all the child
> translation properties after their language ("en", "de", etc.), and  
> you
> would probably add a type to the parent property to indicate that  
> it has
> child translations.  You could even represent a whole tree of
> translations, if, for example, the original value is in Russian,  
> and you
> then make a French translation, then someone uses that to make a
> derivative English translation.
>
> The disadvantage of (3) is that it adds complexity to the data
> structure, especially if you represent the whole family tree of
> translations.  Not sure if there are any actual problems with this yet
> though.
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.1 (GNU/Linux)
> Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org
>
> iD8DBQFETsAkFK83gN8ItOQRAuVuAJ43BcjxBjK86QuGQ3/atHTl41wlbQCeJow1
> on4bcrVRkCHkwscz9B1+KlQ=
> =Z1Gt
> -----END PGP SIGNATURE-----
>
> _______________________________________________
> vos-d mailing list
> vos-d at interreality.org
> http://www.interreality.org/cgi-bin/mailman/listinfo/vos-d




More information about the vos-d mailing list