ArtsAutosBooksBusinessEducationEntertainmentFamilyFashionFoodGamesGenderHealthHolidaysHomeHubPagesPersonal FinancePetsPoliticsReligionSportsTechnologyTravel
  • »
  • Technology»
  • Computers & Software

How to Find Duplicate Records with Microsoft Access

Updated on March 22, 2013
Microsoft Access is, and always will be, the Mack Daddy of desktop database applications.
Microsoft Access is, and always will be, the Mack Daddy of desktop database applications. | Source

Definition of a Duplicate

The hardest part about finding duplicate records is knowing what the definition of "Duplicate" really is for your situation. For example, suppose you have a contacts list with the following structure:

tblContacts

ContactID - Autonumber
LastName - Text 25
FirstName - Text 25
MI - Text 1
Phone - Text 11

Duplicate records could be any one of the following scenarios:

1. A person with two Phone numbers where one number is obsolete.
2. A person with two Phone numbers where both are valid phone numbers
3. Two people have the same name, but a different phone number


Duplicate Quiz

view quiz statistics

Quiz Explanation

Whether or not you took the quiz, the answer is: Only 1 scenario.

Here is a breakdown of the scenarios and whether or not they are a duplicate:

1. A person with two Phone numbers where one number is obsolete.
True: A person with an obsolete phone number is a duplicate, because the former phone number does not contain any value in the database and the obsolete record needs to be deleted.

2. A person with two Phone numbers where both are valid phone numbers
False: In this table example, unfortunately, the only way to represent a person with multiple valid phone numbers is to enter their information into more than one record. Thus, even if the records were found as a duplicate, there is nothing that can be done to remedy it.

3. Two people have the same name, but a different phone number
False: Just like Scenario 2, with the current structure, two people with the exact same name would not be a duplicate. They are two valid entries with two valid phone number. Without another field, such as Drivers License number, to differentiate one from another, then there is no further action that one can be taken.

More Realistic Example

Knowing that the simple table structure above is not realistic, fleshing it out will give it a little more usefulness for when you attempt to find a duplicate. Assume the new table structure is the following:

tblContacts
ContactID
Company
FirstName
LastName
MI
PhoneOffice
PhoneMobile

With this structure, a person can have two phone numbers, and if there are two people with the same name, then the Company field can be used to help differentiate. (Hopefully you can see that this doesn't really solve all problems, because a person can change companies, get a new mobile phone, or get married and change their last name. The possibilities are completely left to interpretation.)

Let's suppose that the business rules that define our duplicates are:

1. Any two records with the same FirstName, LastName, and Mobile phone
2. To resolve issues found by #1, observe the company name.

Based on these business rules, a query can be created to allow for further inspection.

We're off to see the Wizard

Since the untimely death of Clippy the Paperclip, the great sorcerer known as "The Wizard" has been a permanent fixture of many of the Office programs. They can be used for a wide variety of functionality, and creating queries is a well-used feature of them.

Depending on which version you have, triggering the query wizard can happen different ways. For whatever reason, this machine has Access 2007, and the Wizard is found on the ribbon under:

Create (tab) / Query Wizard ('other' section) / Find Duplicates Query Wizard

The Find Duplicates Query Wizard will prompt you for information about your business rules, and then use them to create a query that can retrieve the information based on the rules. (Despite the name, it can't magically understand your data and determine a definition of a duplicate.)

Question 1: Which table or query do you want to search for duplicate field values?
Notice that it allows either a table or a query. This is important, as you may find that to solve your duplicate problems, you may first need to write a query just to get to the point of being able to write this query. Perhaps you need to find duplicates within a certain date range. Writing a "pre-query", as only I have ever called it, will get you off on the correct foot.

In our case, select your contacts table (if you made the one above.)

Question 2: Which fields might contain the duplicate information?
Based on our business rules, we need to add First Name, Last Name and Mobile Phone.

Question 3: Do you want the query to show fields in addition to those with duplicate values?
Despite the fact that you cannot answer this Yes/No question, the way you do answer it is by adding any additional fields that can help you determine if the records is truly a duplicate. In this example, select the company name.

Question 4: What do you want to name your query?
Give it a name based on your naming convention. qryContacts_FindDupes.

Click Finish to view the results. (Obviously, if you haven't added any data to support this example, then the results will be empty.)

Find Duplicates SQL Breakdown

If you're like me, I don't care for magic, I want to know what the sparkling wand created. So, if you open the query in the SQL view, you will see something like the following:

SELECT tblContacts.FirstName, tblContacts.LastName, tblContacts.PhoneMobile, tblContacts.Company FROM tblContacts
WHERE (((tblContacts.FirstName) In

(
SELECT [FirstName] FROM [tblContacts] As Tmp
GROUP BY [FirstName],[LastName],[PhoneMobile]
HAVING Count(*)>1 And
[LastName] = [tblContacts].[LastName] And
[PhoneMobile] = [tblContacts].[PhoneMobile]
)
)
)

ORDER BY tblContacts.FirstName, tblContacts.LastName, tblContacts.PhoneMobile;

This is what's called a Correlated Subquery, whereas the values in a subquery are compared against values in the main query. In this case, the subquery finds any records where Contacts grouped by FirstName, LastName, and PhoneMobile return more than 1 record. HAVING is a where clause for when a GROUP BY is used. It means after the group by is applied, then find values that match this condition.

Duplicate Resolution Summary

According to the 80/20 Rule, after you've defined your business rules for duplication, and gone through the effort to find those that qualify, you're only 80% of the way there. The remaining 20% of the work starts with defining a methodology for remediating the found duplicates. (For example, can records simply be deleted, or is there logic involved to merge two, or more, records into only one.)

Armed with that plan, you can either employ a programmer to devise a clever way to deal with the records, or hire some minimum-wage temporary employees to slog through fixing each record manually.

Comments

    0 of 8192 characters used
    Post Comment

    No comments yet.

    working

    This website uses cookies

    As a user in the EEA, your approval is needed on a few things. To provide a better website experience, hubpages.com uses cookies (and other similar technologies) and may collect, process, and share personal data. Please choose which areas of our service you consent to our doing so.

    For more information on managing or withdrawing consents and how we handle data, visit our Privacy Policy at: "https://hubpages.com/privacy-policy#gdpr"

    Show Details
    Necessary
    HubPages Device IDThis is used to identify particular browsers or devices when the access the service, and is used for security reasons.
    LoginThis is necessary to sign in to the HubPages Service.
    Google RecaptchaThis is used to prevent bots and spam. (Privacy Policy)
    AkismetThis is used to detect comment spam. (Privacy Policy)
    HubPages Google AnalyticsThis is used to provide data on traffic to our website, all personally identifyable data is anonymized. (Privacy Policy)
    HubPages Traffic PixelThis is used to collect data on traffic to articles and other pages on our site. Unless you are signed in to a HubPages account, all personally identifiable information is anonymized.
    Amazon Web ServicesThis is a cloud services platform that we used to host our service. (Privacy Policy)
    CloudflareThis is a cloud CDN service that we use to efficiently deliver files required for our service to operate such as javascript, cascading style sheets, images, and videos. (Privacy Policy)
    Google Hosted LibrariesJavascript software libraries such as jQuery are loaded at endpoints on the googleapis.com or gstatic.com domains, for performance and efficiency reasons. (Privacy Policy)
    Features
    Google Custom SearchThis is feature allows you to search the site. (Privacy Policy)
    Google MapsSome articles have Google Maps embedded in them. (Privacy Policy)
    Google ChartsThis is used to display charts and graphs on articles and the author center. (Privacy Policy)
    Google AdSense Host APIThis service allows you to sign up for or associate a Google AdSense account with HubPages, so that you can earn money from ads on your articles. No data is shared unless you engage with this feature. (Privacy Policy)
    Google YouTubeSome articles have YouTube videos embedded in them. (Privacy Policy)
    VimeoSome articles have Vimeo videos embedded in them. (Privacy Policy)
    PaypalThis is used for a registered author who enrolls in the HubPages Earnings program and requests to be paid via PayPal. No data is shared with Paypal unless you engage with this feature. (Privacy Policy)
    Facebook LoginYou can use this to streamline signing up for, or signing in to your Hubpages account. No data is shared with Facebook unless you engage with this feature. (Privacy Policy)
    MavenThis supports the Maven widget and search functionality. (Privacy Policy)
    Marketing
    Google AdSenseThis is an ad network. (Privacy Policy)
    Google DoubleClickGoogle provides ad serving technology and runs an ad network. (Privacy Policy)
    Index ExchangeThis is an ad network. (Privacy Policy)
    SovrnThis is an ad network. (Privacy Policy)
    Facebook AdsThis is an ad network. (Privacy Policy)
    Amazon Unified Ad MarketplaceThis is an ad network. (Privacy Policy)
    AppNexusThis is an ad network. (Privacy Policy)
    OpenxThis is an ad network. (Privacy Policy)
    Rubicon ProjectThis is an ad network. (Privacy Policy)
    TripleLiftThis is an ad network. (Privacy Policy)
    Say MediaWe partner with Say Media to deliver ad campaigns on our sites. (Privacy Policy)
    Remarketing PixelsWe may use remarketing pixels from advertising networks such as Google AdWords, Bing Ads, and Facebook in order to advertise the HubPages Service to people that have visited our sites.
    Conversion Tracking PixelsWe may use conversion tracking pixels from advertising networks such as Google AdWords, Bing Ads, and Facebook in order to identify when an advertisement has successfully resulted in the desired action, such as signing up for the HubPages Service or publishing an article on the HubPages Service.
    Statistics
    Author Google AnalyticsThis is used to provide traffic data and reports to the authors of articles on the HubPages Service. (Privacy Policy)
    ComscoreComScore is a media measurement and analytics company providing marketing data and analytics to enterprises, media and advertising agencies, and publishers. Non-consent will result in ComScore only processing obfuscated personal data. (Privacy Policy)
    Amazon Tracking PixelSome articles display amazon products as part of the Amazon Affiliate program, this pixel provides traffic statistics for those products (Privacy Policy)