We're looking for a talented developer (must be in PHP) to write a script that groups articles together.
We have a MySQL table that stores a title, url and content of articles - these are added to continuously. We need a script that can be run periodically by a cron job to read in the new articles and either group with an existing article set or create it's own, all based by it's content and age of existing articles (so new articles are only grouped to same story up to X days old). This will mean we can show listings of articles without repeating similar content. So ideally we'd end up with something like:
Care home ban relatives who complain (BBC) - Grouped: Care home bans (Telegraph) - Grouped: Banning relatives from care homes (Independent)
Two US police killed (BBC) - No grouped articles as there's none similar
We've done some initial research and have found the following post which may be of use: http://stackoverflow.com/questions/3320753/how-to-group-compare-similar-news-articles
Ideally we'd have a confidence percentage in the script so if we're finding the grouping is too strict we can easily change a match percentage to allow us to increase/decrease based on user feedback of accuracy.
Please get in touch with any questions and quotes.
All applicants must be based in the UK and confirm that they are eligible to live and work in the UK.
You need to log in to apply for this project. http://www.freelancers.net/project/25622-Grouping-articles---expert-PHP-developer-required