Difference between revisions of "Code Indexer"

From CDOT Wiki
Jump to: navigation, search
(Candidates)
(Status)
 
(69 intermediate revisions by the same user not shown)
Line 1: Line 1:
 +
==Project Name==
 +
Source Code Indexing Service Analysis
 +
 +
==Project Description==
 +
Mozilla’s source code is enormous—millions of lines of C, C++, JavaScript, Perl, Python, Java, C#, etc. Developers currently use the lxr system to quickly search and browse it on-line: http://lxr.mozilla.org. Mozilla is planning a move from CVS to Subversion for revision control, and at the same time wants to evaluate other source indexing services. Two BSD students are working to setup, document, and test other potential services (e.g., fisheye, opengrok, mxr) on one of the Seneca-Mozilla servers (see below). In each case this requires configuration changes and some scripting to get the services to properly integrate with Mozilla’s other on-line tools. When the test services are installed and synched with the live source tree, Mozilla will point its developers to them and get feedback—the students will help collect and synthesize this feedback.
 +
 +
==Project Leader(s)==
 +
[[User:John64 | John Ford (John64)]]
 +
 +
==Project Contributor(s)==
 +
 
==Status==
 
==Status==
* Using personal Machine instead of the VM due to it being double NAT-ed, and thus inaccessible from outside the host machine Oct 2, 2006
+
I now have some time to resume work on the project.  This time I will manage my time more wisely
* Set up server using Ubuntu 6.06LTS with Linux 2.6.17ck1 Oct 2, 2006
+
<strike>I have too many assignments.  I need to do work on them, but I should have more free time during the Christmas break.</strike>
* RSyncing machine Oct 2, 2006
+
 
 +
==Options==
 +
* Help with [http://lxr.mozilla.org/ LXR]/[http://landfill.mozilla.org/mxr-test/ MXR]/[http://www.mozilla.org/bonsai.html Bonsai] development.
 +
* Make a sort-of branch respectful version of [[OpenGrok]] but this would be a very shoddy implementation that doesn't really do what it should
 +
* Setup one OpenGrok per active branch of the Mozilla Project, this would have no version history whatsoever, apart from file dates.
 +
* [Re]write major portions of how [[OpenGrok]] deals with history and changesets and the likes, this is my personal preference.
 +
* Try to fit [http://www.cenqua.com/fisheye/index.html Fisheye] into the current development model, but it seems this might be more like finding a problem for a solution.  This is a very very powerful tool, but it is not really like LXR or OpenGrok, it is more useful to analyze CVS/SVN histories more than search for functions, files, definitions and the likes.  Fisheye is also '''extremely''' slow.  Within my lan, it takes a long time to do any queries, and over the internet it is impossible.  I don't know why, especially since OpenGrok uses the same basic technologies. (10 minutes plus for one page load on a lan connection, in its defence, it is still indexing the code)
 +
 
 +
====Why I like OpenGrok====
 +
Apart from the fact that it does not support branches, this is in my opinion the perfect tool.  It is fast, open souce and most importantly, it makes really easy to navigate, well thought out pages that just work.    Because of the way OpenSolaris does file versions for their code, they don't use branches at all.  OpenSolaris currently uses a linear method of file versioning, they don't use branches, they use versions as a sory of branch, basically the idea that Office 12 is the "2007 branch" and Office 11 is the "2003 branch".  Mozilla doesn't do this, so it would be nessecary to implement this feature.  Luckily, however, OpenGrok is very modularized and atomic in nature. If you go to the OpenGrok page, you can get a more complete explanation, but the basic jist of it is that there are many "Guru's", each with a task.  The files are first read by the History Guru who looks at the file and decides what type of versioning the file uses. Once the versions have been analyzed, they are passed on to the file analyzer guru who then decides what type of file it is, and passes it on to a file type analyzer.  The allows for portions of the code to be changed without changing the whole system, so if we wanted to be able to do special things with XUL/XPCOM as far as how to handle its symbols, we would write one module which is not dependent at all on any other file analyzer. The same way, if Mozilla switchs to SVN, we would just port the branching support to SVN.  On the chances that Mozilla switches to something other than CVS or SVN, a HistoryGuru could be written for that type of versioning history.  The OpenGrok project is under the [http://www.sun.com/cddl/ CDDL] which [http://www.sun.com/cddl/CDDL_MPL_redline.pdf derives from the MPL 1.1]
  
==Candidates==
+
In closing, I really like the OpenGrok project because it is '''very''' fast, '''very''' powerful and '''VERY''' modular!
* LXR/MXR/Bonsai - Not working on setting one up because there is already one
 
* [http://gonzui.sourceforge.net Gonzui] - Impressive looking thing.  This will be my first target to setup due to its simplicity and apparently very powerful nature
 
* [http://www.opensolaris.org/os/project/opengrok/ OpenGrok] - This is by far the coolest project I have come across so far.  It uses Java Server Pages, something I know nothing about, so lots of reading.  Since this is part of the opensolaris project, I have been thinking of trying to run it in an OpenSolaris Virtual Machine, as that OS is picking up steam.  It is available for Linux, and that is the target system.  By the way, this one is my favorite so far. Example: [http://cvs.opensolaris.org/source/ Opensolaris]
 
* Fisheye - Commercial Solution that is free (as in beer) for open source projects.  Before I start to look at it, I would like to exhaust the numerous open source prospects.
 
* Hosted by Tigris - I forgot the name, but its hosted by "Tigris" and it looked pretty good.  If you know the name, please edit as appropriate
 
* [http://savannah.nongnu.org/projects/horus Horus] - Not really what is needed, but its a nice interface for programming students own code.  I will not be actively doing anything to it.
 
* [http://bazaar-vcs.org Bazaar] - I dont really know what this is, it might be what is needed, but it might be something irrelevant.
 
  
 
==Links==
 
==Links==
* [https://sparc.senecacollege.ca/portal.php?project&pid=23 Official Blurb] - Official Blurb, just in case I forget what I am doing :P
+
* [https://sparc.senecacollege.ca/portal.php?project&pid=23 Official Blurb] just in case I forget what I am doing :P
* [http://matrix.senecac.on.ca/~jhford/ My School Page]
+
* [http://matrix.senecac.on.ca/~jhford/ John's School Page]
 
* [http://cvs2svn.tigris.org CVS2SVN] Tool to convert CVS to SVN.  Will be used to test SVN interop.
 
* [http://cvs2svn.tigris.org CVS2SVN] Tool to convert CVS to SVN.  Will be used to test SVN interop.
* [http://en.wikipedia.org/wiki/Apache_Tomcat Tomcat on Wikipedia]
+
* [http://developer.mozilla.org/en/docs/Mozilla_Source_Code_Via_CVS Devmo:CVS Checkout]
 +
* [http://developer.mozilla.org/en/docs/Rsyncing_the_CVS_Repository Devmo: Rsyncing the CVS]
 +
* [http://developer.mozilla.org/en/docs/CVS_Tags Devmo: CVS Tags] - to get the branches to checkout
 +
* [http://www.ubuntuforums.org/showthread.php?t=219985 Tomcat5 on Ubuntu]
 +
* [http://www.onjava.com/pub/a/onjava/2003/06/25/tomcat_tips.html Tomcat Tips]
 +
* [http://www.nongnu.org/cvs/ CVS] on non-gnu.org
 +
* [http://durak.org/cvswebsites/doc/cvs.php CVS "Guide"]
 +
* [http://subversion.tigris.org/ Subversion]
 +
* [http://atucker.typepad.com/blog/2005/11/a_new_source_br.html Blog Entry] on OpenGrok
 +
* [http://ubuntuforums.org/showthread.php?t=124431 Java5 on Ubuntu] - "sudo update-alternatives --config java" and "apt-get remove --purge java-gcj-compat"
 +
* [http://www.mozilla.org/docs/jargon.html Mozilla Jargon]
 +
* [http://www.deitel.com/CodeSearchEngines/CodeSearchEngines_ResourceCenter_MerobaseOpenGrokCodeProject.html Misc]
 +
* [http://www.sun.com/cddl/CDDL_why_details.html CDDL] - explanation of the diferences between the MPL and CDDL
 +
 
 +
==Pulling CVS==
 +
This code will pull the CVS for the branches specified in @branches, or it did at some point,  your mileage may vary
 +
<pre>
 +
#!/usr/bin/perl
 +
use strict;
 +
use warnings;
 +
 
 +
# Pull CVS from the mozilla project server
 +
# Where you want the branch folders
 +
my $src_root = "/var/mozilla";
 +
 
 +
# Where is your run.sh for opengrok? (or equivalent script to start the indexer)
 +
my $opengroker = "/var/opengrok/run.sh";
 +
 
 +
# Where is your server?
 +
my $cvsserver = ':pserver:anonymous@hera.senecac.on.ca:/cvsroot';
 +
 
 +
# Branches to be pulled
 +
my @branches = (
 +
  "HEAD",
 +
  "MOZILLA_1_8_BRANCH",
 +
  "MOZILLA_1_8_0_BRANCH",
 +
  "AVIARY_1_0_1_20050124_BRANCH",
 +
  "REFLOW_20061031_BRANCH"
 +
);
 +
 
 +
# Descriptions for each branch, don't delete old ones for the sake of deleting them
 +
my %descriptions = (
 +
  "HEAD" => "Trunk - development branch",
 +
  "MOZILLA_1_8_BRANCH" => "Firefox 2.0 - development branch",
 +
  "MOZILLA_1_8_0_BRANCH" => "Firefox 1.5 - maintainance branch",
 +
  "MOZILLA_1_7_BRANCH" => "Firefox 1.0 - maintainance branch",
 +
  "AVIARY_1_0_1_20050124_BRANCH" => "Suite - maintainance branch",
 +
  "REFLOW_20061031_BRANCH" => "Reflow Refactoring"
 +
);
 +
 
 +
# Open the file or
 +
open BRANCHLIST, ">$branchlistpath" or die "Could not open file";
 +
 
 +
# Clear out what ever source was there
 +
system ("rm -rf ${src_root}/*");
 +
 
 +
foreach (@branches){
 +
# Download the makefile, then checkout from the makefile
 +
  system("
 +
    mkdir ${src_root}/$_;
 +
    cd ${src_root}/$_;
 +
    cvs -d ${cvsserver} co -r $_ mozilla/client.mk;
 +
    make -f ${src_root}/${_}/mozilla/client.mk checkout MOZ_CO_PROJECT=all;
 +
  ");
 +
}
  
==Server==
+
system ("bash $opengroker");
If you want to access my server through anything other than port 80, you are going to have to type in my address then note the IP address you get in your address bar.  This is because I have dynamic DNS, and this is just easiest for me.  Everything, including the source itself, will be in the http root for easy access to the files.  This is not optimal, and will not stay this way once things advance a little.
 
  
[http://superfind.bounceme.net Superfind]
 
  
==Hardware==
+
</pre>
For those of you who care:
 
* Pentium 4 2.6GHz HT
 
* Asus P4P800SE
 
* 1GB Ram
 

Latest revision as of 14:19, 16 January 2007

Project Name

Source Code Indexing Service Analysis

Project Description

Mozilla’s source code is enormous—millions of lines of C, C++, JavaScript, Perl, Python, Java, C#, etc. Developers currently use the lxr system to quickly search and browse it on-line: http://lxr.mozilla.org. Mozilla is planning a move from CVS to Subversion for revision control, and at the same time wants to evaluate other source indexing services. Two BSD students are working to setup, document, and test other potential services (e.g., fisheye, opengrok, mxr) on one of the Seneca-Mozilla servers (see below). In each case this requires configuration changes and some scripting to get the services to properly integrate with Mozilla’s other on-line tools. When the test services are installed and synched with the live source tree, Mozilla will point its developers to them and get feedback—the students will help collect and synthesize this feedback.

Project Leader(s)

John Ford (John64)

Project Contributor(s)

Status

I now have some time to resume work on the project. This time I will manage my time more wisely I have too many assignments. I need to do work on them, but I should have more free time during the Christmas break.

Options

  • Help with LXR/MXR/Bonsai development.
  • Make a sort-of branch respectful version of OpenGrok but this would be a very shoddy implementation that doesn't really do what it should
  • Setup one OpenGrok per active branch of the Mozilla Project, this would have no version history whatsoever, apart from file dates.
  • [Re]write major portions of how OpenGrok deals with history and changesets and the likes, this is my personal preference.
  • Try to fit Fisheye into the current development model, but it seems this might be more like finding a problem for a solution. This is a very very powerful tool, but it is not really like LXR or OpenGrok, it is more useful to analyze CVS/SVN histories more than search for functions, files, definitions and the likes. Fisheye is also extremely slow. Within my lan, it takes a long time to do any queries, and over the internet it is impossible. I don't know why, especially since OpenGrok uses the same basic technologies. (10 minutes plus for one page load on a lan connection, in its defence, it is still indexing the code)

Why I like OpenGrok

Apart from the fact that it does not support branches, this is in my opinion the perfect tool. It is fast, open souce and most importantly, it makes really easy to navigate, well thought out pages that just work. Because of the way OpenSolaris does file versions for their code, they don't use branches at all. OpenSolaris currently uses a linear method of file versioning, they don't use branches, they use versions as a sory of branch, basically the idea that Office 12 is the "2007 branch" and Office 11 is the "2003 branch". Mozilla doesn't do this, so it would be nessecary to implement this feature. Luckily, however, OpenGrok is very modularized and atomic in nature. If you go to the OpenGrok page, you can get a more complete explanation, but the basic jist of it is that there are many "Guru's", each with a task. The files are first read by the History Guru who looks at the file and decides what type of versioning the file uses. Once the versions have been analyzed, they are passed on to the file analyzer guru who then decides what type of file it is, and passes it on to a file type analyzer. The allows for portions of the code to be changed without changing the whole system, so if we wanted to be able to do special things with XUL/XPCOM as far as how to handle its symbols, we would write one module which is not dependent at all on any other file analyzer. The same way, if Mozilla switchs to SVN, we would just port the branching support to SVN. On the chances that Mozilla switches to something other than CVS or SVN, a HistoryGuru could be written for that type of versioning history. The OpenGrok project is under the CDDL which derives from the MPL 1.1

In closing, I really like the OpenGrok project because it is very fast, very powerful and VERY modular!

Links

Pulling CVS

This code will pull the CVS for the branches specified in @branches, or it did at some point, your mileage may vary

#!/usr/bin/perl
use strict;
use warnings;

# Pull CVS from the mozilla project server
# Where you want the branch folders
my $src_root = "/var/mozilla";

# Where is your run.sh for opengrok? (or equivalent script to start the indexer)
my $opengroker = "/var/opengrok/run.sh";

# Where is your server?
my $cvsserver = ':pserver:anonymous@hera.senecac.on.ca:/cvsroot';

# Branches to be pulled
my @branches = (
  "HEAD",
  "MOZILLA_1_8_BRANCH", 
  "MOZILLA_1_8_0_BRANCH", 
  "AVIARY_1_0_1_20050124_BRANCH",
  "REFLOW_20061031_BRANCH" 
);

# Descriptions for each branch, don't delete old ones for the sake of deleting them
my %descriptions = (
  "HEAD" => "Trunk - development branch",
  "MOZILLA_1_8_BRANCH" => "Firefox 2.0 - development branch",
  "MOZILLA_1_8_0_BRANCH" => "Firefox 1.5 - maintainance branch",
  "MOZILLA_1_7_BRANCH" => "Firefox 1.0 - maintainance branch",
  "AVIARY_1_0_1_20050124_BRANCH" => "Suite - maintainance branch",
  "REFLOW_20061031_BRANCH" => "Reflow Refactoring"
);

# Open the file or
open BRANCHLIST, ">$branchlistpath" or die "Could not open file";

# Clear out what ever source was there
system ("rm -rf ${src_root}/*");

foreach (@branches){
# Download the makefile, then checkout from the makefile
  system("
    mkdir ${src_root}/$_;
    cd ${src_root}/$_;
    cvs -d ${cvsserver} co -r $_ mozilla/client.mk;
    make -f ${src_root}/${_}/mozilla/client.mk checkout MOZ_CO_PROJECT=all;
  ");
}

system ("bash $opengroker");