Automated localization build tool

From CDOT Wiki
Revision as of 22:51, 10 January 2008 by Rueen (talk | contribs)
Jump to: navigation, search

Project Name

Automatic Localization Fork Tool - This script, given a locale and a set of rules, will create an l10n source tree of the same language but different region (e.g. en-IN from en-GB)


Project Description

Project Leader(s)

Releases

The project will be called Auto Fork Localization(l10nFork)

Release functionality & features

  • INSTRUCTIONS - run the script like this:
  1. Download and unzip the 0.3.2 (which contains supporting classes and an l10n tree)
  2. Download the 0.3.3 release and overwrite the one of 0.3.2
  3. Run it this way $> python l10nFork.py en en-XX
0.3.3 - added parsing of arguments and no more using user's input
0.3.2 - no more .bak files and create a duplicate l10n tree with changes
0.3.1 - added more regular expressions
0.2.3 instructions 0.2.3
0.2.2 read above
0.1 instructions 0.1

Release notes

  • 0.3 release
    • Use of command line arguments rather than receiving user's input
    • Fix the key changing problem
    • Add more words to change in an l10n tree but they are hard-coded :(
    • Stop using .bak file (and make the changes permanent)
    • Generate a second l10n tree out of original with the changes
    • Read from a file the changes to be applied and apply them - NOT IMPLEMENTED
  • 0.2 release
    • Given any directory as a starting point, should walk through all sub-directories and files and make changes based on translation rules
    • Add ability to update Properties files
    • Fix key changing problem (In progress/May go to 0.3 release)
    • Add more "rules", for now we are testing by changing color --> colour (May go to 0.3 release)
    • Allow user to enter localization folder (eg; en-GB) as input
  • 0.1 release
    • Should be able to accept a localization
    • Should be able to accept an l10n tree(eg; en-GB or en-US)
    • Read through every DTD and Properties file in the current directory with the "Parser.py" file
    • Changes the word "color" to "colour" in every DTD file and have it saved

Project Contributor(s)

The following people contributed to our project in some beneficial manner. We appreciate all advice received from the Mozilla l10n community, as well as Seneca students, and professors. If you would like to be contributor to our project, just get in touch with us through email, or seneca's IRC channel. The Contribution Info sub-section below outlines our current project needs. Potential contributors can pick one and we can talk about how to proceed.

Potential Ways of Contributing

  1. Python regular expressions help
  2. Bug: When parsing files that use key/value pairs - DTD or Properties - our tool not only changes the value but the key as well. We don't want the key to be changed. (I think this problem is nearly solved though so this bug may be obsolete very soon thanks to Armen)
  3. Testing our script when needed. (Usually right after a release)
  • Timothy Joseph Duavis
    • Reviewed our Python script(s) and provided some ways to improve several methods. In particular, the process() and callback() methods.
  • Vijey Balasundaram
    • Tested our tool and looked for bugs or any potential future pitfalls we might run into. Helped with some documentation for first time tool users.
  • Mozilla l10n Community
    • Provided strong leadership and support. We wouldn't be able to get this project off the ground without them.
  • Seneca Professors
    • Put us in contact with the Mozilla l10n community and helped us get this project off the ground through structure and guidance.

Contribution Info

  • I would like to apply to a string multiple regular expressions substitutions. I have seen something like re.compile(bla bla bla). Could I collect a set of regular expressions in an array an iterate through it applying the substitutions to a string?? or something like that?
  • Timothy Joseph Duavis (here to be of service!) (Done)

Project Details

  • Our script for now; We will be also posting on the bug 399014 - Get our code from the latest release
  • We are also awaiting for some code that dynamis has been working on in Japan - dynamys seems to be lost in combat - I have heard he might me moving to another location and that might be he has been difficult to contact him
  • Notes from Axel(pike) about the project
  • Team notes - we collect notes related to the project
  • Armen's MozDev process - diary - You can read notes of what Armen has been trying
  • The l10n tools are in mxr.m.o/mozilla/source/testing/l10n
  • How to check out the l10n tools using CVS:
    cvs -z3 -d:pserver:anonymous@cvs-mirror.mozilla.org:/cvsroot co mozilla/tools/l10n mozilla/testing/tests/l10n
  • The file we have been using: Parser.py using the DTDParser class and the PropertiesParser class. These both are derived from the Parser class.
  • mozilla/tools/l10n/l10n.py might give you an idea of what it takes to copy existing data over to a new location
  • To get the l10n tools type: $> cvs -z3 -d:pserver:anonymous@cvs-mirror.mozilla.org:/cvsroot co mozilla/tools/l10n mozilla/testing/tests/l10n
  • Some notes from trying to get en-GB (read more):
* make -f client.mk l10n-checkout MOZ_CO_PROJECT=browser MOZ_CO_LOCALES=en-GB LOCALES_CO_TAG=HEAD
* An option for the .mozconfig: mk_add_options MOZ_OBJDIR=@TOPSRCDIR@/../en-GB
  • To get en-US:
* 
  • To check the completness of your localization //takes long
* make -f tools/l10n/l10n.mk check-l10n
  • To fill out what is missing in the source of your localization
* MOZ_CO_PROJECT=browser make -f tools/l10n/l10n.mk create-en-GB

Regular expressions

>> color(s) -> colour(s) -- re.sub(r'([Cc])olor', r'\1olour', instring)
>> dialogue  -> dialog -- re.sub(r'([Dd])ialogue', r'\1ialog', instring)
>> Go forward -> Go forwards -- 
>> Minimize -> Minimise
>> Center -> Centre
>> Organize -> Organise
>> Customize -> Customise

DTD regular expression analysis

  • An analysis of regular expression, specific to DTDs (from Parser.py)
self.key = re.compile('<!ENTITY\s+([\w\.]+)\s+(\"(?:[^\"]*\")|(?:\'[^\']*)\')\s*>', re.S)
  \s+ - one or more (??) blank spaces, tabs, end of line, and others whitespace elements
  ([\w\.]+) - one or more alphanumeric characters and/or(??) a dot
  \s+ - more of the same
  (\"(?:[^\"]*\")|(?:\'[^\']*)\') - difficult part
        if the left of '|' matches,the right part doesn't get analyzed
    (\"(?:[^\"]*\") - matching something in between " and "
         (?:[^\"]*\") - everything after "?:" if matched cannot be referenced
             not a backslash or a " - the * indicate zero or more - RIGHT???
    (?:\'[^\']*)\') - the right side of the difficult reg. expression

  \s* - none or more white characters
  re.S - makes the dot to match even new lines - it is like raising DOTALL  flag
Example of matching line:
  "<!ENTITY  colorsDialog.title              "Colors">"

Related regular expressions theory

* (...) what is inside the parentheses are a group - the contents of a group can be retrieved after a match has been performed, 

and can be matched later in the string with the \number special sequence

* (?...) - This is an extension notation - Extensions usually do not create a new group; (?P<name>...) is the only 

exception to this rule. Following are the currently supported extensions.

* List of supported extensions: (?iLmsux), (?:...), (?P<name>...), (?P=name), (?#...), (?=...), (?!...), (?<=...), 

(?<!...), (?(id/name)yes-pattern|no-pattern)

* \number - Matches the contents of the group of the same number. Groups are numbered starting from 1.

Project news

There are some common news from the collaborators that should be written here rather than splitting it between the collaborators:

  • Nov. 11, 2007 - Updated the contributions page.
  • Nov. 10, 2007 - l10nMerge tool updated from 0.2.2 to 0.2.3 and available for download under the 0.2 Release Functionality & Features section. Added new feature (allow user to directly input localization [eg; en-GB] ), added lots of code documentation and debug sections.
  • Nov. 9, 2007 - Created new bug (403215) that will guide the discussion about our tool
  • Oct. 31, 2007 - It seems that a lot of things are going to be talked around l10n tools in general (" L10n tools talking" in Google Groups) and specifically in the bug (Bug 399014) "related" (between brackets) to our project. We will have to read dynamis' (he has finally appeared, he is called Asai Tomoya) code and see what the new direction of our script should take.
WE ARE A LITTLE CONFUSED BUT AFTER THE WEEKEND SOME LIGHT MIGHT DROP INTO THE PROJECT (Done)
  • Oct. 30, 2007 - Axel has replied on the bug and it seems that our project shouldn't be called l10n merge, the bug seems to be for another reason. Some doubts arise concerning what our tool should really be doing.
  • Oct. 25, 2007 - Meeting with Michal; She explained us the different tools that out there created by the community and by next week will gives us more information on which direction to follow
  • Oct. 23, 2007 - Added 0.2 Release Functionality & Features section. Team meeting held (6 hours) - our tool is now able to, given a localization directory (or any directory for that matter), walk through all sub-directories and files and make changes based on our translation rules. We've also determined a few other things to do before our 0.2 release and even brainstormed 0.3 ideas (incorporating l10n's setup.py script into our tool like Axel suggested).
  • Oct. 12, 2007 - Updated 0.1 Release Functionality & Features section. A lot of the 0.1 code has been done.
  • Oct. 07, 2007 - Added an 0.1 Release Functionality & Features section to the wiki so we have a clear description of what our project's 0.1 release should be able to do.
  • Oct. 05, 2007 - Python will be our language of choice for this project which is a great opportunity to thoroughly learn it since it will be our first time using it. Determined some main tasks ahead of us before 0.1 release (tasks mentioned in Product Description).
  • Sep. 24, 2007 - We are going to have a call conference with Michal from Toronto office

Bugs


Old want to help

  • We have a problem when matching a pattern inside of the DTDs, look below:

re.sub(r'([Cc])olor', r'\1olour', instring) Changes every appearance of 'Color' or 'color' with 'Colour' and 'colour' (the Canadian wa) Therefore a line like: <!ENTITY colorsDialog.title "Colors"> becomes like this: <!ENTITY coloursDialog.title "Colours"> WE DON'T WANT coloursDialog.title!!! A better regular expression could avoid this but we are not good yet with RegExps (check anlaysis on RegExprs on next section) SOLVED: re.sub(r'(.*)([Cc])olor', r'\1\2olour', instring)

  • We would like to walk into subfolders and make changes in each subfolder; any good python example?

SOLVED: Already works on 0.2 release