|
|
Line 6: |
Line 6: |
| [mailto:aasauvageot@myseneca.ca?subject=dps901-dps921 Email All] | | [mailto:aasauvageot@myseneca.ca?subject=dps901-dps921 Email All] |
| | | |
− | == Progress == | + | ==Assignment== |
− | | |
− | === Pre-Assignment ===
| |
− | I decided to create a new program to test a theory I was told.
| |
− | | |
− | I was told by a professor that she believed that by taking a look at how a paper was written, she could tell if it was written by the same author. Further, she believed that a computer could tell if two pieces of text were written by the same author by looking at how it was written.
| |
− | | |
− | I decided to create a program that would analyze two pieces of text to try and determine if the same person wrote both pieces.
| |
− |
| |
− | I decided to look at:
| |
− | # '''average words/sentence'''
| |
− | # '''average word length'''
| |
− | # '''average sentences/paragraph'''
| |
− | # '''average commas/sentence'''
| |
− | # '''average colons/paragraph.'''
| |
− | | |
− | | |
− | I then use this information to calculate how different two pieces are from each other. If they are within what I determined to be a 5% different writing style, I suggest the two pieces were written by the same person, otherwise I suggest they were written by two separate people.
| |
− | | |
− | To test this I ran the program on work by Shakespeare, One of my friends, and myself.
| |
− | | |
− | The program successfully was able to determine which author wrote each piece of text.
| |
− | | |
− | | |
− | === Assignment ===
| |
− | | |
− | The program I wrote relies on one single loop to run through a piece of text. It has no dependencies, so it can easily be parallelized using the methods discussed in this class.
| |
− | | |
− | ====Timing====
| |
− | | |
− | To time the program I used various pieces of text. I used text from 3 authors, with varying lengths. I used 2 Shakespeare works (long - 46,956 words 250,234 characters), 2 assignments I completed for school (medium - 1,885 words 11,336 characters), and 2 blog posts that were written by the same author (short - 869 words 4,997 characters).
| |
− | | |
− | =====Serial=====
| |
− | | |
− | {| class="wikitable"
| |
− | |+ Time for serial program run
| |
− | ! Author!! Character Count !! Time (milliseconds)
| |
− | |-
| |
− | | Shakespeare || 250,234 || 157
| |
− | |-
| |
− | | Adrian Sauvageot|| 11,336 || 7
| |
− | |-
| |
− | | Blog Post|| 4,997 || 3
| |
− | |}
| |
− | | |
− | | |
− | =====OpenMP=====
| |
− | | |
− | {| class="wikitable"
| |
− | |+ Time for OpenMP parallel program run
| |
− | ! Author!! Character Count !! Time (milliseconds)
| |
− | |-
| |
− | | Shakespeare || 250,234 || 72
| |
− | |-
| |
− | | Adrian Sauvageot|| 11,336 || 17
| |
− | |-
| |
− | | Blog Post|| 4,997 || 7
| |
− | |}
| |
− | | |
− | ====Timing Explanation====
| |
− | For the large flie, (250,234 characters) the parallel program had a significant speed up, however for the medium and small file, the serial program actually ran faster. I believe this is due to the overhead of setting up threads.
| |