![]() ![]() ![]() You should be familiar with your data to determine what similarity degree is best for the analysis.īefore IDEA 10.0 introduced “Fuzzy Duplicates”, Audimation developers utilized a different approach that supported files of several thousand records. Values such as “JOHN SMITH” and “JOHN J SMITH” will have 95% similarity, while “JAMES SMITH” and “JIM SMITH” have an 82% similarity. In addition, if they pick a percentage too low, they can end up with false positives and must run the process again. ![]() This is great for smaller files, but users have found the process to be CPU intensive when scanning files with several thousand of records. The image below showcases the default selections, scanning for COMPANY names with 80% or greater similarity, including exact duplicates. This option can be found within the “Analysis” tab and under “Duplicate Key”. The output produces databases, including or excluding fuzzy matches with varying degrees of similarity to detect data entry errors, multiple data conventions for recording information and fraud. CaseWare IDEA® Version 10 introduced an Advanced Fuzzy Duplicate task, which identifies multiple similar records for up to three selected character fields.
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |