Difference between revisions of "The Real A Team"
(→Project Name Goes here) |
(→Determining Author By Style Of Writing) |
||
Line 31: | Line 31: | ||
=== Assignment === | === Assignment === | ||
+ | |||
+ | The program I wrote relies on one single loop to run through a piece of text. It has no dependencies, so it can easily be parallelized using the methods discussed in this class. | ||
+ | |||
+ | ====Timing==== | ||
+ | |||
+ | To time the program I used various pieces of text. I used text from 3 authors, with varying lengths. I used 2 Shakespeare works (long - 46,956 words 250,234 characters), 2 assignments I completed for school (medium - 1,885 words 11,336 characters), and 2 blog posts that were written by the same author (short - 869 words 4,997 characters). | ||
+ | |||
+ | =====Serial Timing===== | ||
+ | |||
+ | {| class="wikitable" | ||
+ | |+ Time for serial program run | ||
+ | ! Author!! Character Count !! Time (milliseconds) | ||
+ | |- | ||
+ | | Shakespeare || 250,234 || 157 | ||
+ | |- | ||
+ | | Adrian Sauvageot|| 11,336 || 7 | ||
+ | |- | ||
+ | | Blog Post|| 4,997 || 3 | ||
+ | |} | ||
+ | |||
+ | |||
+ | =====OpenMP Timing===== | ||
+ | |||
+ | {| class="wikitable" | ||
+ | |+ Time for OpenMP parallel program run | ||
+ | ! Author!! Character Count !! Time (milliseconds) | ||
+ | |- | ||
+ | | Shakespeare || 250,234 || 72 | ||
+ | |- | ||
+ | | Adrian Sauvageot|| 11,336 || 17 | ||
+ | |- | ||
+ | | Blog Post|| 4,997 || 7 | ||
+ | |} |
Revision as of 18:26, 21 March 2016
GPU621/DPS921 | Participants | Groups and Projects | Resources | Glossary
Contents
Determining Author By Style Of Writing
A Team Members
- Adrian Sauvageot, All
- ...
Progress
Pre-Assignment
I decided to create a new program to test a theory I was told.
I was told by a professor that she believed that by taking a look at how a paper was written, she could tell if it was written by the same author. Further, she believed that a computer could tell if two pieces of text were written by the same author by looking at how it was written.
I decided to create a program that would analyze two pieces of text to try and determine if the same person wrote both pieces.
I decided to look at:
- average words/sentence
- average word length
- average sentences/paragraph
- average commas/sentence
- average colons/paragraph.
I then use this information to calculate how different two pieces are from each other. If they are within what I determined to be a 5% different writing style, I suggest the two pieces were written by the same person, otherwise I suggest they were written by two separate people.
To test this I ran the program on work by Shakespeare, One of my friends, and myself.
The program successfully was able to determine which author wrote each piece of text.
Assignment
The program I wrote relies on one single loop to run through a piece of text. It has no dependencies, so it can easily be parallelized using the methods discussed in this class.
Timing
To time the program I used various pieces of text. I used text from 3 authors, with varying lengths. I used 2 Shakespeare works (long - 46,956 words 250,234 characters), 2 assignments I completed for school (medium - 1,885 words 11,336 characters), and 2 blog posts that were written by the same author (short - 869 words 4,997 characters).
Serial Timing
Author | Character Count | Time (milliseconds) |
---|---|---|
Shakespeare | 250,234 | 157 |
Adrian Sauvageot | 11,336 | 7 |
Blog Post | 4,997 | 3 |
OpenMP Timing
Author | Character Count | Time (milliseconds) |
---|---|---|
Shakespeare | 250,234 | 72 |
Adrian Sauvageot | 11,336 | 17 |
Blog Post | 4,997 | 7 |