eaiovnaovbqoebvqoeavibavo package Locale::MakePhrase; our $VERSION = 0.4; our $DEBUG = 0; =head1 NAME Locale::MakePhrase - Language translation facility =head1 SYNOPSIS These group of modules are used to translate application text strings, which may or may not include values which also need to be translated, into the prefered language of the end-user. Example: use Locale::MakePhrase::BackingStore::Directory; use Locale::MakePhrase; my $bs = new Locale::MakePhrase::BackingStore::Directory( directory => '/some/path/to/language/files', ); my $mp = new Locale::MakePhrase( language => 'en_AU', backing_store => $bs, ); ... my $color_count = 1; print $mp->translate("Please select [_1] colors.",$color_count); Output: Please select a colour. Notice that a) the word 'color' has been localised to Australian English, and b) that the argument has influenced the resultant output text to take into account the display of the singular version. =head1 DESCRIPTION This aim of these modules are to implement run-time evaluation of an input phrase, including program arguments, and have it generate a suitable output phrase, in the language and encoding specified by the user of the application. Since this problem has been around for some time, there are a number of sources of useful information available on the web, which describes why this problem is hard to solve. The problem with most existing solutions is that each design suffers some form of limitation, often due to the designer thinking that there are enough commonalities between all/some langugaes that these commonalities can be factored into a various rules which can be implemented in programming code. However, each language has it own history and evolution. Thus it is pointless to compare two different languages unless they have a common history and a common character set. I continuing to read this document, you really should read the following info on the L Perl module: http://search.cpan.org/~sburke/Locale-Maketext-1.08/lib/Locale/Maketext.pod and at the slides presented here: http://www.autrijus.org/webl10n/ The L modules are based on a design similar to the L module, except that this new implementation has taken a different approach, that being... Since it is possible (and quite likely) that the application will need to be able to understand the language rules of any specific language, we want to use a run-time evaluation of the rules that a linguist would use to convert one language to another. Thus we have coined the term I as a means to describe this technique. These rules are used to decide which piece of text is displayed, for a given input text and arguments. =head1 REQUIREMENTS The L module was initially designed to meet the requirements of a web application (as opposed to a desktop application), which may display many languages in the HTML form at any given instance. Its design is modelled on a similar design of using language lexicons, which is in use in the existing L Perl module. The reason for building a new module is because: =over 2 =item * We wanted to completely abstract the language rule capability, to be programming language agnostic so that we could re-implement this module in other programming languages. =item * We needed run-time evaluation of the rules, since the translations may be updated at any time; new rules may be added whenever there is some ambigutiy in the existing phrase. Also, we didn't want to re-start the application whenever we updated a rule. =item * We would like to support various types of storage mechanisms for the translations. The origonal design constraint prefered the use of a PostgreSQL database to hold the translations - most existing language translation systems use flat files. =item * We want to store/manipulate the current text phrase, only encoded in UTF-8 (ie: we dont want to store the text in a locale-specific encoding). This allows us to output text to any other character set. =back As an example of application usage, it is possible for a Hebrew speaking user to be logged into a web-form which contains Japanese data. As such they will see: =over 2 =item * Menus and tooltips will be translated into the users' language (ie: Hebrew). =item * Titles will be in the language of the dataset (ie: Japanese). =item * Some of the data was in Latin character set (ie: English). =item * If the user prefered to see the page as RTL rather than LTR, the page was altered to reflect this preference. =back =head1 BACKGROUND When implementing any new software, it is necessary to understand the problem domain. In the case of language translation, there are a number of requirements that we can define: =over 4 =item 1. Quite a few people speak multiple languages; we would like the language translation system to use the users preferred language localisation, or if we don't know which language that is, try to make an approximate guess, based on application capabilites. =over 4 =item Eg: In a web-browser, the user normally sets their prefered language/dialect. The browser normally sends this information to a web-server during the request for a page. The server may choose to show the page contents in the language the user prefers. =back =item 2. Since some people speak multiple languages, the application may not have been localised to their prefered localisation. We should try to fallback to using a language which is similar. =over 4 =item Eg: If there are no Spanish translations available, we should fallback to Mexican, since Mexican and Spanish have many words in common. =back =item 3. Some languages support the notion of a dialect for that language. A good example is that the English language is used in many countries, but countries such as the United States, Australia and Great Britain each have their own localised version ie. the dialect is specified as the country or region. The language translation mechanism needs to be able to use the users' preferred dialect when looking up the text to display. If no translation is found, then it should fall back to the parent language. =over 4 =item Eg: The language/dialect of Australia is defined as 'en_AU' - when we lookup a text translation, if we fail we should try to lookup the 'en' translation. =back =item 4. Some languages are written using a script which displays its output as right-to-left text (as used by Arabic, Hebrew, etc), rather than left-to-right text (as used by English, Latin, Greek, etc). The language translation mechanism should allow the text display mechanism to change the text direction if that is a requirement (which is another reason for mandating the use of UTF-8). =item 5. The string to be translated should support the ability to re-order the wording of the text. =over 4 =item Eg: In English we would normally say something like "Please enter your name"; in Japanese the equivalent translation would be something like "Enter your name, please" (although it would be in Japanese, not English). =back =item 6. The text translation mechanism should support the ability to show arguments supplied to the string (by the application), within the correct context of the meaning of the string. =over 4 =item Eg: We could say something like "You selected 4 balls" (where the number 4 is program dependant); in another language you may want to say the equivalent of "4 balls selected". =back Notice that the numeric position has moved from being the third mnemonic, to being the first mnemonic. The requirement is that we would like to be able to rearrange the order/placement of any mnemonic (including any program arguments). =item 7. We would like to be able to support an arbitrary number of argument replacements. We shouldn't be limited in the number of replacements that need to occur, for any given number program arguments. =over 4 =item Eg: We want to have an unlimited number of placeholders as exemplified by the string "Select __ balls, __ bats, __ wickets, plus choose __ people, ___ ..." and so on. =back =item 8. Most program arguments that are given to strings are in numeric format (i.e. they are a number). We would also like to support arguments which are text strings, which themselves should be open to language translation (but only after rule evaluation). The purpose being that the output phrase should make sense within the current context of the application. =item 9. In a lot of languages there is the concept of singular and plural. While in other languages there is no such concept, while in others still there is the concept of duality. There is also the concept that a phrase can be descriptive when discussing the zero of something. Thus we want to display a specific phrase, depending on the value of an argument. =over 4 =item Eg: In English, the following text "Selected __ files" has multiple possible outputs, depending on the program value; we can have: 0 case: "No files selected" - no numeric value 1 case: "One file selected" - 'files' is singular 2 case: "Selected two files" - the '__' is a text value, not a number more than 2 case: "Lots of selections" - no direct comparison to the original text =back ...as we can see, this is just for translating a single text string, from English to English. To counter this problem, the translation system needs to be able to apply linguistic rules to the original text, so that it can evaluate which piece of text should be displayed, given the current context and program argument. =item 10. When updating a specific phrase for language translation, the next screen re-draw should show the new translation text. Thus translations need to be dynamically changeable, and run-time configurable. =back =head1 INTERNAL TEXT ENCODING This module uses UTF-8 text encoding internally, thus it requires a minimum of Perl 5.8. So, for any given application string and user language combination, we require the backing store look-up the combination, then return a list of L objects, which must be created with the key and translated strings being stored in the UTF-8 encoding. Thus, to simplify the string-load functionality, we recommend to load / store the translated strings as UTF-8 encoded strings. See L for more information. =over 4 =item ie. The PostgreSQL backing store assumes that the database instance stores strings in the UNICODE encoding (rather than, say, ASCII); this avoids the need to translate every string when we load it. =back =head1 OUTPUT TEXT ENCODING L uses UTF-8 encoding internally, as described above. This is also the default output encoding. You can choose to have a different output encoding, such as ISO-8859-1. Normlly, if the output display mechanism can display UNICODE (encoded as UTF-8), then text will be rendered in the correct language and correct text direction (ie. left-to-right or right-to-left). By supplying the encoding as a constructor argument, L will transpose the translated text from UTF-8, into your output-specific encoding (using the L module). This is useful in cases where font support within an application, hasn't yet evolved to the same level as a language-specific font. See the L module for a list of available output encodings. Default output character set encoding: B =head1 WHAT ARE LINGUISTIC RULES? Since the concept of a linguistic rule is at the heart of this translation module, its documentation is located in L. It explains the syntax of the rule expressions, how rules are sorted and selected, as well as the operators and functions that are available within the expressions. You should read that information, before continuing. =over 2 =item Available operators: B<==>, B, B>, B>, B=>, B=>, B, B =item Available functions: B, B, B, B, B, B, B, B, B, B =back =head1 Object API The following methods are part of the L object API: =cut { no warnings; require v5.8.0; } use strict; use warnings; use utf8; use integer; use base qw(Exporter); use Data::Dumper; use I18N::LangTags 0.21 (); use Encode; use Encode::Alias; use Locale::MakePhrase::BackingStore; use Locale::MakePhrase::RuleManager; use Locale::MakePhrase::LanguageRule; use Locale::MakePhrase::Numeric; use Locale::MakePhrase::Utils qw(is_number die_from_caller); our $default_language = "en"; our $default_backing_store = "Locale::MakePhrase::BackingStore"; our $default_rule_manager = "Locale::MakePhrase::RuleManager"; our $default_malformed_mode = Encode::FB_PERLQQ; our $default_numeric_format = Locale::MakePhrase::Numeric->DOT; our $internal_encoding = "utf-8"; our $this = undef; local $Data::Dumper::Indent = 1 if $DEBUG; # Exported symbols our @EXPORT_OK = qw(mp __ print); # constants sub MALFORMED_MODE_ESCAPE { return Encode::FB_PERLQQ; } sub MALFORMED_MODE_HTML { return Encode::FB_HTMLCREF; } sub MALFORMED_MODE_XML { return Encode::FB_XMLCREF; } # We add the 'utf-8' alias for the 'utf8' encoding, # as we support both syntactical forms. Encode::Alias::define_alias('utf-8' => 'utf8'); #-------------------------------------------------------------------------- =head2 new() Construct new instance of Locale::MakePhrase object. Takes the following named parameters (ie: via a hash or hashref): =over 2 =item C =item C Specify one or more languages which are used for locating the correct language string (all forms are supported; first found is used). They take either a string (eg 'en'), a comma-seperated list (eg 'en_AU, en_GB') or an array of strings (eg ['en_AU','en_GB']). The order specified, is the order that phrases are looked up. These strings go through a manipulation process (using the Perl module L) of: =over 3 =item 1. The strings are converted to RFC3066 language tags; these become the primary tags. =item 2. Superordinate tags are retrieved for each primary tag. =item 3. Alternates of the primary tags are then retrieved. =item 4. Panic language tags are retrieved for each primary tag (if enabled). =item 5. The fallback language is retrieved (see 'fallback language'). =item 6. Duplicate language tags are removed. =item 7. All tags are converted to lowercase, and '-' are changed to '_'. =back This leaves us with a list of at least the fallback language. =item C =item C This option (both forms are supported; first found is used) allows you to change the output character set encoding, to something other than UTF-8, such as ISO-8859-1. See L for more information. =item C Takes either a reference to a backing store instance, or to a string which can be used to dynamically construct the instance. The final backing store instance must have a type of L. Default: use a L =item C Takes either a reference to a rule manager instance, or to a string which can be used to dynamically construct the instance. The final manager instance must have a type of L. Default: use a L =item C Perl normally outputs \x{HH} for malformed characters (or \x{HHHH}, \x{HHHHHH}, etc. for wide characters). Setting this value, changes the behaviour to output alternative character entity formats. Note that if you are using L to generate strings used within web pages / HTML, you should set this parameter to CMALFORMED_MODE_HTML>. =item C This option allows the user to control how numbers are output. You can set the output to be one of a number of forms of stringification defined in L, eg: =over 2 =item '.', ',', '(', ')' Place comma seperators before every third digit; use brackets for negative, as in: (10,000,000.1) =back This takes either a string format or an array reference containing the format. Default: dont format; show decimal as full-stop =item C Set this option to true to make L die if the translated string is incorrectly formatted (eg: too many argument place holders are specified) or the expression is not valid. The alternative is to output the phrase BINVALID TRANSLATIONE> or BINVALID EXPRESSIONE>. Die'ing here means that translations have the ability to abort your code. If you dont have control over the quality of the phrases added to your dictionary, you should probably use the default behaviour. Note that an invalid expression or translation generates a warning to STDERR. Default: dont die; output the appropriate error phrase =item C Set this option to false to make L not translate the applied arguments, before applying them to the output of the engine. This saves you from having to call translate() for each argument, within your own code. Default: do translate arguments =item C Set this option to true to make L automatically add newline characters to the end of every translated string. The reason for having this is to allow your translation-key to not require the OS-dependent newline character(s), and to not require newline character(s) on the target-translation. Note that the API provides alternate method calls so as to allow you to add newline character(s) as necessary. Default: dont add any newline characters =item C Set this option to true to make L load 'panic' languages as defined by L. Basically it provides a mechanism to allow the engine to return a language string from languages which has a similar heritage to the primary language(s), if a translation from the primary language hasn't been found. eg: Spanish has a similar heritage as Italian, thus if no translations are found in Italian, then Spanish translations will be used. Default: dont lookup panic-languages =item Notes: If the arguments aren't a hash or hashref, then we assume that the arguments are languages tags. If you dont supply any language, the fallback language will be used. Default language: B =back =cut sub new { my $proto = shift; my $class = ref($proto) || $proto; my $self = bless {}, $class; # We allow different forms of argument passing. # The only argument we really need is the language, but we should use other arguments as well. my %options; if (@_ == 1 and ref($_[0]) eq "HASH") { %options = %{$_[0]}; } elsif (@_ > 1 and not(@_ % 2)) { %options = @_; } elsif (@_ > 0) { my @languages = split(',',$_[0]); $options{languages} = \@languages; }; print STDERR "Arguments to ". $class .": ". Dumper(\%options) if $DEBUG > 5; $self->{options} = \%options; # allow sub-class to control construction $self = $self->init(); return undef unless $self; # process options, and initialise module $self->{encoding} = $self->_get_encoding(); $self->{malformed_character_mode} = $self->_get_malformed_mode(); $self->{numeric_format} = $self->_get_numeric_format(); $self->{die_on_bad_translation} = (exists $options{die_on_bad_translation}) ? ($options{die_on_bad_translation} ? 1 : 0) : $self->{die_on_bad_translation} ? 1 : 0; $self->{translate_arguments} = (exists $options{translate_arguments}) ? ($options{translate_arguments} ? 1 : 0) : (exists $self->{translate_arguments}) ? ($self->{translate_arguments} ? 1 : 0) : 1; $self->{add_newline} = (exists $options{add_newline}) ? ($options{add_newline} ? 1 : 0) : $self->{add_newline} ? 1 : 0; $self->{panic_language_lookup} = (exists $options{panic_language_lookup}) ? ($options{panic_language_lookup} ? 1 : 0) : $self->{panic_language_lookup} ? 1 : 0; $self->{languages} = $self->_get_languages(); $self->{rule_manager} = $self->_get_rule_manager(); $self->{language_modules} = $self->_load_language_modules(); $self->{backing_store} = $self->_attach_backing_store(); $this = $self; print STDERR "Resultant $class object: ". Dumper($self) if $DEBUG > 7; return $self; } #-------------------------------------------------------------------------- =head2 $self init([...]) Allow sub-class a chance to control construction of the object. You must return a reference to $self, to 'allow' the construction to complete. At this point of construction you can call C<$self-Eoptions()> which returns a reference to the current constructor options. This allows you to add/modify any existing options; for example you may want to inject something specific... =cut sub init { shift } #-------------------------------------------------------------------------- =head2 $string context_translate($context, $string [, ...]) [ C<$context> is either a text string or an object reference (which then gets stringified into its class name). ] This is a primary entry point; call this with your application context, your string and any program arguments which need to be translated. Note however that in most cases you will most likely want to call the L function instead; see below. In some cases you will find that you will use the same text phrase in one part of your application, in a seperate part of your application, but the meaning of the phrase is different (due to the different application context); supplying a context will allow your backing store to use the extra context information, to return the correct language rules. The steps involved in a string translation are: =over 3 =item 1. Fetch all possible translation rules for all language tags (including alternates and the fallbacks), from the backing store. The store will return a list reference of LanguageRule objects. =item 2. Sort the list based on the implementation defined in the L module. =item 3. The the rule instance for which the rule-expression evaluates to B for the supplied program arguments (if there is no expression, the rule is always true). =item 4. If no rules have been selected, then make a rule from the input string. =item 5. Apply the program arguments to the rules' translated text. If the argument is a text phrase, it (optionally) undergoes the language translation procedure. If the argument is numeric, it is formatted by one of your language sub-classes, or the L module. =item 6. We apply the output character set encoding to convert the text from UTF-8 into the prefered character set. If the output encoding is UTF-8 (thus matching the internal encoding), this item does nothing. =back =cut sub context_translate { die_from_caller("context_translate() requires at least two parameters") unless @_ > 2; my ($self,$context,$key) = (shift,shift,shift); die_from_caller("context_translate() requires a valid key") unless (defined $key and length $key); $context = ref($context) if (defined $context and ref($context) ne 'SCALAR'); print STDERR "Translation key: $key\n" if $DEBUG; my $backing_store = $self->{backing_store}; my $languages = $self->{languages}; # Get all possible translations/rules from backing store my $rule_objs = $backing_store->get_rules($context,$key,$languages); # Sort the rules according to the linguistic rule algorithms $rule_objs = $self->_sort_rules($rule_objs,$languages); # Select the specific rule, based on the linguistic rules for each rule my $rule_obj = $self->_select_rule($rule_objs, @_); # at this point we can clean up some resources $backing_store = undef; $languages = undef; $rule_objs = undef; # If no rule found, use input key $rule_obj = new Locale::MakePhrase::LanguageRule( language => $self->fallback_language, translation => $key, ) unless (defined $rule_obj); # Apply arguments to translated text my $translated_text = $self->_apply_arguments($rule_obj, @_); # apply encoding $translated_text = $self->_apply_encoding($translated_text); print STDERR "Translated text: $translated_text\n" if $DEBUG; return $translated_text.$\ if $self->{add_newline}; return $translated_text; } #-------------------------------------------------------------------------- =head2 $string translate($string [, ...]) This is a primary entry point; call this with your string and any program arguments which need to be translated. This function is a wrapper around the C function, where the context is set to undef (which is usually what you want). =cut sub translate { die_from_caller("translate() requires at least one parameter") unless @_ > 1; return shift->context_translate(undef,@_); } #-------------------------------------------------------------------------- =head2 $string context_translate_ln($context, $string [, ...]) This is a primary entry point; call this with your context, string and any program arguments which need to be translated. This function is a wrapper around the C function, but this adds newline character(s) to the output. =cut sub context_translate_ln { die_from_caller("translate() requires at least two parameters") unless @_ > 2; my $s = shift->context_translate(@_); return $s.$\; } #-------------------------------------------------------------------------- =head2 $string translate_ln($string [, ...]) This is a primary entry point; call this with your string and any program arguments which need to be translated. As above, this function is a wrapper around the C function, where the context is set to undef, but this adds newline character(s) to the output. =cut sub translate_ln { die_from_caller("translate() requires at least one parameter") unless @_ > 1; my $s = shift->context_translate(undef,@_); return $s.$\; } #-------------------------------------------------------------------------- =head2 $string format_number($number,$options) This method implements the numbers-specific formatting, by calling into L's C method. To provide custom handling of number formatting, you can do one of: =over 2 =item * Define a L number formatting option. =item * Implement 'per-language' number formatting, by sub-classing the L module, then implementing a C method. =back =cut sub format_number { my ($self, $number, $options) = @_; $options = {} unless $options; if (ref($self)) { unless (exists $options->{numeric_format}) { if ($self->{numeric_format}) { $options->{numeric_format} = $self->{numeric_format}; } else { $options->{numeric_format} = Locale::MakePhrase::Numeric->DOT; } } # Allow the custom language-handling module a chance at formatting the number my $modules = $self->{language_modules}; foreach my $module (@$modules) { my $can = $module->can('format_number'); next unless $can; print STDERR "Found language specific number formatter on module: ". ref($module) ."\n" if $DEBUG > 3; return &$can($self,$number,$options); } } return Locale::MakePhrase::Numeric->stringify($number,$options); } #-------------------------------------------------------------------------- =head2 $backing_store fallback_backing_store() Backing store to use, if not specified on construction. You can overload this in a sub-class. =cut sub fallback_backing_store { $default_backing_store } #-------------------------------------------------------------------------- =head2 $string fallback_language() Language to fallback to, if all others fail (this defaults to 'B'). You can override this method in a sub-class. Usually this will be the language that you are writing your application code (eg: you may be coding using German rather than English). Note that this must return a RFC-3066 compliant language tag. =cut sub fallback_language { $default_language } #-------------------------------------------------------------------------- =head2 $string_array language_classes() This method returns a list of possible class names (which must be sub-classes of L) which can get prepended to the language tags for this instance. L will then try to dynamically load these modules during construction. The idea being that you simply need to put your language-specific module in the same directory as your sub-class, thus we will find the custom modules. Alternatively, you can sub-class this method, to return the correct class heirachy name. =cut sub language_classes { my ($self) = @_; my $class = ref($self); my $superclass = $class; $superclass =~ s/(.*)::.+$/$1/; my @classes = ( $class, $class."::Language", $class."::Languages", $superclass, $superclass."::Language", $superclass."::Languages" ); return \@classes; } #-------------------------------------------------------------------------- =head2 $format numeric_format($format) This method allows you to set and/or get the format that is being used for numeric formatting. You can supply an array, an array ref, or a string. =cut sub numeric_format { my $self = shift; if (@_) { my $mode; if (@_ > 1) { @$mode = @_; } else { ($mode) = @_; } my $options = { numeric_format => $mode }; $self->{numeric_format} = $self->_get_numeric_format($options); } return $self->{numeric_format} ? $self->{numeric_format} : Locale::MakePhrase::Numeric->DOT; } #-------------------------------------------------------------------------- =head2 Accessor methods =over 2 =item $hash B Returns the options that were supplied to the constructor. =item $string_array B Returns a list of the language tags that are in use. =item $object_list B Returns a list of the loaded language modules. =item $object B Returns the loaded backing store instance. =item $object B Returns the loaded rule manager instance. =item $string B Returns the output character set encoding. =item $int B Returns the current UTF-8 malformed character output mode. =item $bool B Returns the current state of 'L'. =item $bool B Returns the current state of 'L'. =item $bool B Returns the current state of 'L'. =item $bool B Returns the current state of 'L'. =back =cut sub options { shift->{options} } sub languages { shift->{languages} } sub language_modules { shift->{language_modules} } sub backing_store { shift->{backing_store} } sub rule_manager { shift->{rule_manager} } sub encoding { shift->{encoding} } sub malformed_character_mode { shift->{malformed_character_mode} } sub die_on_bad_translation { shift->{die_on_bad_translation} } sub translate_arguments { shift->{translate_arguments} } sub add_newline { shift->{add_newline} } sub panic_language_lookup { shift->{panic_language_lookup} } #-------------------------------------------------------------------------- =head1 Function API The following items are helper functions, which can be used to simplify the usage of L objects. =cut #-------------------------------------------------------------------------- =head2 $string mp($string [, ...]) This is a helper function to the translate() function call. It will use the last-constructed instance of L to invoke the translate function on. eg: print mp("This is test no: [_1]",$test_no); could produce: This is the first test. =cut sub mp { local $Locale::MakePhrase::Utils::DIE_FROM_CALLER = 1; die_from_caller("You must construct at least one Locale::MakePhrase object, before using this function.") unless $this; die_from_caller("mp() requires at least one parameter") unless @_ > 0; return $this->context_translate(undef,@_); } #-------------------------------------------------------------------------- =head2 $string __ $string [, ...] This function is the same as the previous helper function, except that it makes you code easier to read and easier to write. eg: print __"This is test no: [_1]",$test_no; could produce: This is test no: 4 Note that we use double-underscore as this makes search-n-replace tasks easier than if we used a single-underscore. =cut sub __ { local $Locale::MakePhrase::Utils::DIE_FROM_CALLER = 1; die_from_caller("You must construct at least one Locale::MakePhrase object, before using this function.") unless $this; die_from_caller("__() requires at least one parameter") unless @_ > 0; return $this->context_translate(undef,@_); } =cut #-------------------------------------------------------------------------- =head2 NOTE The previous functions use a reference to an internal variable. If you are using this module from within Apache (say under mod_perl), make sure that you construct a new instance of a L object, in the child Apache processes. =cut #-------------------------------------------------------------------------- # The following methods are not part of the API - they are private. # # This means that everything above this code-break is allowed/designed # to be overloaded. #-------------------------------------------------------------------------- #-------------------------------------------------------------------------- # # Load/construct the backing store. # # We can pass in a string name of a backing store to use, # or an object reference to a previously constructed backing store. # sub _attach_backing_store { my ($self) = @_; my $options = $self->{options}; my $backing_store; my $store; # use default backing store if none defined if (exists $options->{backing_store}) { $backing_store = $options->{backing_store}; } elsif (exists $options->{backing_store}) { $backing_store = $self->{backing_store}; } else { $backing_store = $self->fallback_backing_store; die_from_caller("Failed to locate a default backing store") unless $backing_store; print STDERR "Using fallback backing store ($backing_store)\n" if $DEBUG > 1; } # if not a reference, try to construct one unless (ref($backing_store)) { ## see if perl module can be loaded eval "use $backing_store"; die_from_caller("Failed to load backing store: $backing_store") if ($@); ## try constructing it eval '$store = '. "$backing_store" .'->new()'; die_from_caller("Failed to construct backing store: $backing_store") if ($@); die_from_caller("Backing store connection failure: $backing_store") unless ($store); # use a passed in reference to a backing store } else { $store = $backing_store; } $options->{backing_store} = ref($store) if (exists $options->{backing_store}); ## make sure backing store ISA Locale::MakePhrase::BackingStore object die_from_caller("Backing store is not of type Local::MakePhrase::BackingStore") unless ($store->isa('Locale::MakePhrase::BackingStore')); return $store; } #-------------------------------------------------------------------------- # # Return an rule_manager object that is to be used in subsequent rule evaluations # # We can pass in a string name of a rule manager to use, # or an object reference to a previously constructed rule manager. # sub _get_rule_manager { my ($self) = @_; my $options = $self->{options}; my $rule_manager; my $manager; # use default manager if none defined if (exists $options->{rule_manager}) { $rule_manager = $options->{rule_manager}; } elsif (exists $self->{rule_manager}) { $rule_manager = $self->{rule_manager}; } else { print STDERR "Using default rule_manager ($default_rule_manager)\n" if $DEBUG > 1; $rule_manager = $default_rule_manager; } # if its not a reference, try constructing it unless (ref($rule_manager)) { # see if perl modle can be loaded eval "use $rule_manager"; die_from_caller("Failed to load rule manager: $rule_manager") if ($@); # try constructing it eval '$manager = '. "$rule_manager" .'->new()'; die_from_caller("Failed to construct rule manager: $rule_manager") if ($@ or not $manager); # use passed in rule_manager } else { $manager = $rule_manager; } $options->{rule_manager} = ref($manager) if (exists $options->{rule_manager}); # make sure rule_manager ISA Locale::MakePhrase::RuleManager object die_from_caller("Rule manager is not of type Locale::MakePhrase::RuleManager") unless ($manager->isa('Locale::MakePhrase::RuleManager')); return $manager; } #-------------------------------------------------------------------------- # # Return list of languages that we want to handle (highest to lowest priority). # # This implementation does the following: # # a) grab the required language(s) by looking for optins (in order) of: # - language -> string, # - languages -> string array, # - languages -> string containing a comma seperated list, # # b) then convert those language/dialect(s) into 'languages tags' # # c) generate 'super ordinate' language tags for results from b) # # d) generate 'alternate' languages tags for result from c) # # e) add the 'panic' language tags from the results of d) (if enabled) # # f) add the fallback language (after converting it to a language tag) # # g) strip off any duplicate tags # # h) make sure all tags only contain [a-z0-9_], by # - stripping unknown characters # - converting uppercase to lowercase # - converting '-' to '_' # sub _get_languages { my ($self) = @_; my $options = $self->{options}; my @languages; ## get prefered language(s) if (exists $options->{language}) { push @languages, $options->{language}; } elsif (exists $options->{languages} && ref($options->{languages}) eq "ARRAY") { @languages = @{$options->{languages}}; } elsif (exists $options->{languages}) { @languages = split(',',$options->{languages}); } elsif (exists $self->{language}) { push @languages, $self->{language}; } elsif (exists $self->{languages} && ref($self->{languages}) eq "ARRAY") { @languages = @{$self->{languages}}; } elsif (exists $self->{languages}) { @languages = split(',',$self->{languages}); } # Lookup real language/dialect definitions, from supplied language @languages = map I18N::LangTags::locale2language_tag($_), @languages; push @languages, map I18N::LangTags::super_languages($_), @languages; # catch alternations @languages = map { $_, I18N::LangTags::alternate_language_tags($_) } @languages; # get at least an approximate language if ($self->{panic_language_lookup}) { push @languages, I18N::LangTags::panic_languages(@languages); } # add a fallback language, just in case specified languages dont work my $fallback = $self->fallback_language; die_from_caller("Must implement something valid for 'fallback_language' method") unless $fallback; $fallback = I18N::LangTags::locale2language_tag($fallback); push @languages, $fallback; # strip off duplicate languages { my @langs; LOOP: foreach my $lang (@languages) { foreach my $l (@langs) { next LOOP if I18N::LangTags::same_language_tag($l,$lang); } push @langs, $lang; } @languages = @langs; } # final bit of processing: { my @langs; foreach my $lang (@languages) { $lang =~ tr<-A-Z><_a-z>; # lc, and turn - to _ $lang =~ tr<_a-z0-9><>cd; # remove all but a-z0-9_ next unless $lang; push @langs, $lang; } @languages = @langs; } print STDERR "Available languages: ", join(',',@languages), "\n" if $DEBUG > 1; return \@languages; } #-------------------------------------------------------------------------- # # If the user specified a charset (and its not UTF-8), we want to be # able to encode the output translation into that charset/encoding, # before returning. Here we simply capture that info. # sub _get_encoding { my ($self) = @_; my $options = $self->{options}; my $encoding = $internal_encoding; if (exists $options->{charset}) { $encoding = $options->{charset}; } elsif (exists $options->{encoding}) { $encoding = $options->{encoding}; } elsif (exists $self->{charset}) { $encoding = $self->{charset}; } elsif (exists $self->{encoding}) { $encoding = $self->{encoding}; } die_from_caller("Invalid encoding specified") unless $encoding; $encoding =~ tr<_A-Z><-a-z>; # lc, and turn _ to - return $encoding; } #-------------------------------------------------------------------------- # # Figure out what to do when there is a malformed character in the string, # when transcoding from UTF-8 to another charset/encoding. # sub _get_malformed_mode { my ($self) = @_; my $options = $self->{options}; my $mode = $default_malformed_mode; if (exists $options->{malformed_character_mode}) { $mode = $options->{malformed_character_mode}; } elsif (exists $self->{malformed_character_mode}) { $mode = $self->{malformed_character_mode}; } if (!defined $mode or ($mode != MALFORMED_MODE_ESCAPE and $mode != MALFORMED_MODE_HTML and $mode != MALFORMED_MODE_XML)) { die_from_caller("Unknown malformed-character mode:",$mode); } return $mode; } #-------------------------------------------------------------------------- # # Figure out what numeric formatting we want # sub _get_numeric_format { my ($self,$options) = @_; $options = $self->{options} unless $options; my $mode = $default_numeric_format; if (exists $options->{numeric_format}) { $mode = $options->{numeric_format}; } elsif (exists $self->{numeric_format} and defined $self->{numeric_format}) { $mode = $self->{numeric_format}; } $mode = [] unless (defined $mode); @$mode = split('',$mode) if (ref($mode) eq ''); die_from_caller("Unknown numeric formatting mode") unless (ref($mode) eq 'ARRAY'); $mode->[0] = '.' if (@$mode == 0); $mode->[1] = '' if (@$mode == 1); $mode->[2] = '-' if (@$mode == 2); $mode->[3] = '' if (@$mode == 3); die_from_caller("Unknown numeric-formatting mode:",join(',',@$mode)) unless (@$mode == 4); for (0..scalar(@$mode)-1) { die_from_caller("Undefined numeric format in placeholder: $_") unless (defined $mode->[$_]); } $mode = undef if (join(',',$mode) eq join(',',Locale::MakePhrase::Numeric->DOT)); return $mode; } #-------------------------------------------------------------------------- # # We try to load all the languages that user is able to use. # This allows the application to install their own method calls into the # language (even at run-time), if they really deem it to be necessary. # sub _load_language_modules { my ($self) = @_; my $languages = $self->{languages}; my $classes = $self->language_classes(); my @language_modules; foreach my $language (@$languages) { # try loading the language specific module # (we try various module-names to see which one resolves) my $module; foreach my $class (@$classes) { $module = $class ."::". $language; print STDERR "Trying to load language module: $module\n" if $DEBUG > 2; eval "use ". $module; last unless ($@); $module = undef; } next unless $module; print STDERR "Loaded language module: $module\n" if $DEBUG > 2; # try constructing the language specific module my $object; eval '$object = '. "$module" .'->new()'; next if ($@); # coool - special code for this language print STDERR "Found custom language handling object for language: $language\n" if $DEBUG > 1; push @language_modules, $object; } return \@language_modules; } #-------------------------------------------------------------------------- # # Take the list of all rules, then sort them by language, then by priority, # then by non-specified rule/language # sub _sort_rules { my ($self,$rule_objs,$languages) = @_; return undef if (not defined $rule_objs or @{$rule_objs} < 1); return $rule_objs if (@{$rule_objs} == 1); # sort the rules by language my $manager = $self->{rule_manager}; return $manager->sort($rule_objs,$languages); } #-------------------------------------------------------------------------- # # Select one of the rules, by applying the arguements # sub _select_rule { my ($self,$rule_objs,@args) = @_; return undef unless $rule_objs; my $manager = $self->{rule_manager}; # run manager on the translation rules foreach my $r_obj (@$rule_objs) { my $expression = $r_obj->expression; return $r_obj unless (length $expression); my $result = eval { $manager->evaluate($expression,@args); }; if ($@) { die $@ if $self->{die_on_bad_translation}; print STDERR $@; next; } return $r_obj if $result; } # no rule matched return undef; } #-------------------------------------------------------------------------- # # Apply the arguments, to the new translated text # sub _apply_arguments { my ($self,$rule_obj) = (shift,shift); my $translation = $rule_obj->translation(); my $manager = $self->{rule_manager}; return $manager->apply_arguments($self,$translation,@_); } #-------------------------------------------------------------------------- # # Apply the current encoding to the translated text # sub _apply_encoding { my ($self,$text) = @_; return $text if ($self->{encoding} eq $internal_encoding); $text = encode("UTF8", $text, $self->{malformed_character_mode}); from_to($text, "UTF8", $self->{encoding}); return $text; } #-------------------------------------------------------------------------- # # We want to see if the function called, appears in the language specific sub-class. # If so, then we execute that function, in _that class_'s scope, then return the results. # # We want to do this so as to allow functionality such as: # $mp->y_or_n( get_user_input() ); # sub AUTOLOAD { my $func = our $AUTOLOAD; $func =~ s/^.*:://; my $self = $this; $self = shift unless ($func eq 'mp' or $func eq '_'); my $language_modules = $self->{language_modules}; # See if the language-specific module contains this function name, and if so, run it foreach my $module (@$language_modules) { print STDERR "Trying to find function \"$func\" on module: ". ref($module) ."\n" if $DEBUG > 1; my $can = $module->can($func); next unless $can; return &$can($module,@_); } # generate error from caller perspective, if we couldn't execute the function my $languages = $self->{languages}; die_from_caller("No function \"$func\" found for languages:", join(',',@$languages)); } #-------------------------------------------------------------------------- # # Whenever use AUTOLOAD, we need to implement DESTROY # sub DESTROY {} 1; __END__ #-------------------------------------------------------------------------- =head1 SUB-CLASSING These modules can be used standalone, or they can be sub-classed so as to control certain aspects of its behaviour. Each inidividual module from this group, is capable of being sub-classed; refer to each modules' specific documentation, for more details. In particular the L module is designed to be sub-classed, so as to support, say, language-specific keyboard input handling. =head2 Construction control Due to the magic of inheritance, there are two primary ways to control construction any of these modules: =over 3 =item 1. Overload the C method =over 2 =item * Implement the C method in your sub-class =item * call C so as to execute the parent class constructor =item * re-bless the returned object =back For example: sub new { my $class = shift; ... my $self = $class->SUPER::new(...sub-class specific arguments...); $self = bless $self, $class; ... return $self; } =item 2. Overload the C method. =over 2 =item * implement the C method in your sub-class =item * return a reference to the current object. =back For example: sub init { my $self = shift; ... return $self; } =back =head2 Sub-classing this module This module (C) has a number of methods which can be overloaded: =over 2 =item * init() =item * fallback_backing_store() =item * fallback_language() =item * language_classes() =item * format_number() =back =head1 DEBUGGING Since this module and framework are relativley new, it is quite likely that a few bugs may still exist. By setting the module-specific C variable, you can enable debug messages to be sent to STDERR. Set the value to zero, to disable debug. Setting progressively higher values (up to a maximum value of 9), results in more debug messages being generated. The following variables can be set: $Locale::MakePhrase::DEBUG $Locale::MakePhrase::RuleManager::DEBUG $Locale::MakePhrase::LanguageRule::DEBUG $Locale::MakePhrase::BackingStore::Cached::DEBUG $Locale::MakePhrase::BackingStore::File::DEBUG $Locale::MakePhrase::BackingStore::Directory::DEBUG $Locale::MakePhrase::BackingStore::PostgreSQL::DEBUG =head1 NOTES =head2 Text directionality This module internally uses UTF-8 character encoding for text storage for a number of reasons, one of them being for the ability to encode the directionality within the text string using Unicode character glyphs. However it is up to the application drawing mechanism to support the correct interpretation of these Unicode glyphs, before the text can be displayed in the correct direction. =head2 Localised text layout In some languages there may be a requirement that we layout the application interface, using a different layout scheme than what would normally be available. This requirement is known as layout localisation. An example might be, Chinese text should prefer to layout top-to-bottom left-to-right, (rather than left-to-right top-to-bottom). This module doesn't provide this facility, as that is up to the application layout mechanism to handle the differences in layout. eg: A web-browser uses HTML as a formatting language; web-browsers do not implement top-to-bottom text layout. =head1 SEE ALSO L is made up of a number of modules, for which there is POD documentation for each module. Refer to: =over 2 =item . L =item . L =item . L =item . L =item . L =item . L =item . L =item . L =item . L =item . L =item . L =back It also uses the following modules internally: =over 2 =item . L =item . L =item . L =back You can (and should) read the documentation provided by the L module. =head1 BUGS =head2 Multiple levels of quoting The rule expression parser cannot handle multiple levels of quoting. It needs modification to support this (however, this may make the parser slower). =head2 Expression parsing failure The rule expression parser splits the rule into sub-expressions by chunking on ' && '. This means it will fail to parse a text evaluation containing these characters. For example this will fail to parse: _1 eq ' && ' Since the ' && ' is not a common text expression, this bug will probably never be fixed. =head1 TODO Need to add support for male / female context of phrase. This could be implemented using a context specific translation, however the better way would be to add native support for gender. =head1 CREDITES This module was written for NetRatings, Inc.; they paid for part of my time to develop this module. Various suggestions and bug fixes were also provided by: =over 4 =item Brendon Oliver =item John Griffin =back =head1 LICENSE This module was written by Mathew Robertson L for NetRatings, Inc. LEwww.netratings.com>. Copyright (C) 2006 This module is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License version 2 (or at your option, any later version) as published by the Free Software Foundation LEwww.fsf.org>. This module is distributed WITHOUT ANY WARRANTY WHATSOEVER, in the hope that it will be useful to others. =cut