Automated localization build tool

From CDOT Wiki
Revision as of 14:55, 1 November 2007 by Rueen (talk | contribs) (Project Details)
Jump to: navigation, search

Want to help??? - Project Contributor(s)

  • I would like to apply to a string multiple regular expressions substitutions. I have seen something like re.compile(bla bla bla). Could I collect a set of regular expressions in an array an iterate through it applying the substitutions to a string?? or something like that?
  • We want to find out where our tool should be located inside the source tree


Project Leader(s)


Project Name

automated localization tool - This tool, given a locale and a set of rules, create a build of the same language but different region (e.g. en-IN from en-GB)


Project Description

  • THE BUG - Bug 399014 – we need an l10n-merge tool
  • Learn python
  • Understand the scripts from the test l10n tools
  • Understand the l10n build system
  • Reproduce en-IN from en-GB
  • Determine what our Python based system will "do" in 0.1 release


Releases

Our python script(I think shouldn't be called tool) since on the bug we can see that Axel was referring to something else, BUT we will keep it until a better name comes

Tool's 0.2 Release Functionality & Features

  1. Given any directory as a starting point, should walk through all sub-directories and files and make changes based on translation rules (Done)
  2. Add ability to update Properties files (Done)
  3. Fix key changing problem (may go to 0.3)
  4. Add more "rules", for now we are testing by changing color --> colour (may go to 0.3)

Tool's 0.1 Release Functionality & Features

  1. Should be able to accept a localization (Done)
  2. Should be able to accept a Firefox build (eg; en-GB or en-US) (Done)
  3. Read through every DTD and Properties file in the current directory with the "Parser.py" file (Done)
  4. Changes the word "color" to "colour" in every DTD file and have it saved (Done)

Project Details

  • Our script for now; We will be also posting on the bug 399014 - Get our code from the latest release
  • We are also awaiting for some code that dynamis has been working on in Japan - dynamys seems to be lost in combat - I have heard he might me moving to another location and that might be he has been difficult to contact him
  • Notes from Axel(pike) about the project
  • Team notes - we collect notes related to the project
  • Armen's MozDev process - diary - You can read notes of what Armen has been trying
  • The l10n tools are in mxr.m.o/mozilla/source/testing/l10n
  • How to check out the l10n tools using CVS:
    cvs -z3 -d:pserver:anonymous@cvs-mirror.mozilla.org:/cvsroot co mozilla/tools/l10n mozilla/testing/tests/l10n
  • The file we have been using: Parser.py using the DTDParser class and the PropertiesParser class. These both are derived from the Parser class.
  • mozilla/tools/l10n/l10n.py might give you an idea of what it takes to copy existing data over to a new location
  • To get the l10n tools type: $> cvs -z3 -d:pserver:anonymous@cvs-mirror.mozilla.org:/cvsroot co mozilla/tools/l10n mozilla/testing/tests/l10n
  • Some notes from trying to get en-GB (read more):
* make -f client.mk l10n-checkout MOZ_CO_PROJECT=browser MOZ_CO_LOCALES=en-GB LOCALES_CO_TAG=HEAD
* An option for the .mozconfig: mk_add_options MOZ_OBJDIR=@TOPSRCDIR@/../en-GB
  • To get en-US:
* 
  • To check the completness of your localization //takes long
* make -f tools/l10n/l10n.mk check-l10n
  • To fill out what is missing in the source of your localization
* MOZ_CO_PROJECT=browser make -f tools/l10n/l10n.mk create-en-GB

Regular expressions

>> color(s) -> colour(s) -- re.sub(r'([Cc])olor', r'\1olour', instring)
>> dialogue  -> dialog -- re.sub(r'([Dd])ialogue', r'\1ialog', instring)
>> Go forward -> Go forwards -- 
>> Minimize -> Minimise
>> Center -> Centre
>> Organize -> Organise
>> Customize -> Customise

DTD regular expression analysis

  • An analysis of regular expression, specific to DTDs (from Parser.py)
self.key = re.compile('<!ENTITY\s+([\w\.]+)\s+(\"(?:[^\"]*\")|(?:\'[^\']*)\')\s*>', re.S)
  \s+ - one or more (??) blank spaces, tabs, end of line, and others whitespace elements
  ([\w\.]+) - one or more alphanumeric characters and/or(??) a dot
  \s+ - more of the same
  (\"(?:[^\"]*\")|(?:\'[^\']*)\') - difficult part
        if the left of '|' matches,the right part doesn't get analyzed
    (\"(?:[^\"]*\") - matching something in between " and "
         (?:[^\"]*\") - everything after "?:" if matched cannot be referenced
             not a backslash or a " - the * indicate zero or more - RIGHT???
    (?:\'[^\']*)\') - the right side of the difficult reg. expression

  \s* - none or more white characters
  re.S - makes the dot to match even new lines - it is like raising DOTALL  flag
Example of matching line:
  "<!ENTITY  colorsDialog.title              "Colors">"

Related regular expressions theory

* (...) what is inside the parentheses are a group - the contents of a group can be retrieved after a match has been performed, and can be matched later in the string with the \number special sequence
* (?...) - This is an extension notation - Extensions usually do not create a new group; (?P<name>...) is the only exception to this rule. Following are the currently supported extensions.
* List of supported extensions: (?iLmsux), (?:...), (?P<name>...), (?P=name), (?#...), (?=...), (?!...), (?<=...), (?<!...), (?(id/name)yes-pattern|no-pattern)
* \number - Matches the contents of the group of the same number. Groups are numbered starting from 1.

Project news

There are some common news from the collaborators that should be written here rather than splitting it between the collaborators:

  • Oct. 31, 2007 - It seems that a lot of things are going to be talked around l10n tools in general (" L10n tools talking" in Google Groups) and specifically in the bug (Bug 399014) "related" (between brackets) to our project. We will have to read dynamis' (he has finally appeared, he is called Asai Tomoya) code and see what the new direction of our script should take.
WE ARE A LITTLE CONFUSED BUT AFTER THE WEEKEND SOME LIGHT MIGHT DROP INTO THE PROJECT
  • Oct. 30, 2007 - Axel has replied on the bug and it seems that our project shouldn't be called l10n merge, the bug seems to be for another reason. Some doubts arise concerning what our tool should really be doing.
  • Oct. 25, 2007 - Meeting with Michal; She explained us the different tools that out there created by the community and by next week will gives us more information on which direction to follow
  • Oct. 23, 2007 - Added 0.2 Release Functionality & Features section. Team meeting held (6 hours) - our tool is now able to, given a localization directory (or any directory for that matter), walk through all sub-directories and files and make changes based on our translation rules. We've also determined a few other things to do before our 0.2 release and even brainstormed 0.3 ideas (incorporating l10n's setup.py script into our tool like Axel suggested).
  • Oct. 12, 2007 - Updated 0.1 Release Functionality & Features section. A lot of the 0.1 code has been done.
  • Oct. 07, 2007 - Added an 0.1 Release Functionality & Features section to the wiki so we have a clear description of what our project's 0.1 release should be able to do.
  • Oct. 05, 2007 - Python will be our language of choice for this project which is a great opportunity to thoroughly learn it since it will be our first time using it. Determined some main tasks ahead of us before 0.1 release (tasks mentioned in Product Description).
  • Sep. 24, 2007 - We are going to have a call conference with Michal from Toronto office

Bugs

- There are no bugs related specifically to the project, but the following were mentioned in our conversations


Old want to help

  • We have a problem when matching a pattern inside of the DTDs, look below:

re.sub(r'([Cc])olor', r'\1olour', instring) Changes every appearance of 'Color' or 'color' with 'Colour' and 'colour' (the Canadian wa) Therefore a line like: <!ENTITY colorsDialog.title "Colors"> becomes like this: <!ENTITY coloursDialog.title "Colours"> WE DON'T WANT coloursDialog.title!!! A better regular expression could avoid this but we are not good yet with RegExps (check anlaysis on RegExprs on next section) SOLVED: re.sub(r'(.*)([Cc])olor', r'\1\2olour', instring)

  • We would like to walk into subfolders and make changes in each subfolder; any good python example?

SOLVED: Already works on 0.2 release