1
edit
Changes
no edit summary
Resources: bsterne, dveditz, shaver
== HTML page set sanitizer ==
The Talos performance testing system at Mozilla currently runs on a large set of web pages pulled from the Alexa Top 500. These pages can't be redistributed, since they're mirrors of copyrighted web pages. In addition, many of them contain adult content. This makes it difficult for people to duplicate the Talos results or to test changes that have an expected performance impact.
A useful solution to this problem would be a tool that takes a mirrored copy of a website and "sanitizes" it, by changing the page text and image contents (making them junk or filler text or something). The caveat here is that this *cannot* change the performance characteristics of the page. For example, taking a page that is all Chinese text and replacing it with "Lorem Ipsum" filler text would cause the page to take different text rendering paths, which would change what is measured. As another example, making all JPEG images solid black would likely make them decode and render much faster. Any solution should have some analysis performed that shows that performance is not significantly altered in the sanitized page set.
Resources: ted (but find someone better!)
= Potential OOo Projects =