Urban75 Home About Offline BrixtonBuzz Contact

Excel People help! Duplicates based on multiple columns

newme

Giant in Pastyland now
I have a table, with a bunch of personal and business info in, sometimes people asked the same thing on the same day and this has to be reported as a single incident but cannot be removed. So adding another column to flag this.

So we have [Name] and [Date] as column headers to look at, if they both match then I want to return well anything at all really to [Duplicate] so I can filter this on a pivot table.

I have come across using countif, xlookup and filter for this but am screwing up somewhere and its driving me bonkers, I am not even supposed to be working today.
Anyone?
 
sorry,

pivot tables here





my level of understanding of excel here.

hope you get it sorted

although from this and something you've posted elsewhere, part of me wonders whether your organisation is really trying to use excel as a database, and whether that's an entirely good idea. although it may be too late to ask this...
 
My excell ninja days are a few years back. I'd probably go for countif but with multiple fields I think you are pushing excel to do something that access or another database would be more appropriate for.
 
sorry,

pivot tables here





my level of understanding of excel here.

hope you get it sorted

although from this and something you've posted elsewhere, part of me wonders whether your organisation is really trying to use excel as a database, and whether that's an entirely good idea. although it may be too late to ask this...
Pretty much, there is a CRM coming but I heard that before and leadership have too so want an intermin solution where it all just generates the pretty charts when I redirect to the new file dump.

Honestly looking at what they did before I don't think they were removing these duplicates at all, they just dumped it all together, filtered by date (ignoring all the ones that were things like cheese or wombat or whatever bloody nonsense someone put instead of a date). Then chucked it into a power point they had already by looking at various numbers when you filtered by other columns too and manually editing them.
 
My excell ninja days are a few years back. I'd probably go for countif but with multiple fields I think you are pushing excel to do something that access or another database would be more appropriate for.
Quite likely, sure we will be using Powerbi at some point as its the new shiny so thats probably next on the list to learn. I am supposed to be doing public procurement..... but given the last guy was counting I think they just went ah you can make a pivot chart lets see if you can do this.
 
Pretty much, there is a CRM coming but I heard that before and leadership have too so want an intermin solution where it all just generates the pretty charts when I redirect to the new file dump.

Honestly looking at what they did before I don't think they were removing these duplicates at all, they just dumped it all together, filtered by date (ignoring all the ones that were things like cheese or wombat or whatever bloody nonsense someone put instead of a date). Then chucked it into a power point they had already by looking at various numbers when you filtered by other columns too and manually editing them.
How many entries a month? One of the things I learnt along the way (along with All You Can Eat Buffett niot being a challenge) is that with smaller data sets sometimes it's just quicker to do manual filter and sort than waste hours trying to get a clever soultion.
 
I reckon you will probably need a macro to do this.

Chat GPT has given me this as a macro to use:

Sub MarkDuplicatesInColumnC()
Dim ws As Worksheet
Dim lastRow As Long
Dim i As Long, j As Long

' Set the worksheet where you want to mark duplicates
Set ws = ThisWorkbook.Sheets("Sheet1") ' Change "Sheet1" to your sheet's name

' Find the last row with data in column A
lastRow = ws.Cells(ws.Rows.Count, "A").End(xlUp).Row

' Loop through each row
For i = 2 To lastRow
' Check if this row has already been marked as a duplicate
If ws.Cells(i, 3).Value <> "Duplicate" Then
' Loop through the rows below the current row
For j = i + 1 To lastRow
' Check if both column A and column B are duplicates
If ws.Cells(i, 1).Value = ws.Cells(j, 1).Value And ws.Cells(i, 2).Value = ws.Cells(j, 2).Value Then
' Mark both rows as "Duplicate"
ws.Cells(i, 3).Value = "Duplicate"
ws.Cells(j, 3).Value = "Duplicate"
End If
Next j
End If
Next i
End Sub
I asked it:

Write me an excel macro that will write "Duplicate" in column C if both Column A and Column B are duplicates of another row's column A and Column B entry

Of course I bet you won't have that data in those columns but it looks like this bit is the bit to change with a number instead of letter:

If ws.Cells(i, 1).Value = ws.Cells(j, 1).Value And ws.Cells(i, 2).Value = ws.Cells(j, 2).Value Then
' Mark both rows as "Duplicate"
ws.Cells(i, 3).Value = "Duplicate"
ws.Cells(j, 3).Value = "Duplicate"
 
is this data robust enough to find duplicates properly? or are the customer names going to all over the shop (where you have different staff typing different things for the same customer)
So we have [Name] and [Date] as column headers to look at, if they both match then I want to return well anything at all really to [Duplicate] so I can filter this on a pivot table.
 
How many entries a month? One of the things I learnt along the way (along with All You Can Eat Buffett niot being a challenge) is that with smaller data sets sometimes it's just quicker to do manual filter and sort than waste hours trying to get a clever soultion.
Maybe 1-200 but also needs quarterly done, which of course they started Q1 as October....
The auto thing is more that we need to do 3 different reports so its like 60 odd charts on everything you can imagine, x per borough, sex, gender, same for quarter, then as a table for another, some pie charts, different pie charts, etc. So if I can just get the base data accurate the rest should fill. Did a successful switch to new data source for the PQ and that worked. It is more this is supposed to be one and done for me, then they can stop getting the head of department (associate director level) on insane wages to stop spending two days a month on this.
is this data robust enough to find duplicates properly? or are the customer names going to all over the shop (where you have different staff typing different things for the same customer)
It is now I fixed it, it was a fucking mess before, ever seen someone parse it wrong and have ethnicity come up for disability status? Holy shit that would have been bad.
 
I reckon you will probably need a macro to do this.

Chat GPT has given me this as a macro to use:


I asked it:

Write me an excel macro that will write "Duplicate" in column C if both Column A and Column B are duplicates of another row's column A and Column B entry

Of course I bet you won't have that data in those columns but it looks like this bit is the bit to change with a number instead of letter:

If ws.Cells(i, 1).Value = ws.Cells(j, 1).Value And ws.Cells(i, 2).Value = ws.Cells(j, 2).Value Then
' Mark both rows as "Duplicate"
ws.Cells(i, 3).Value = "Duplicate"
ws.Cells(j, 3).Value = "Duplicate"
I am very basic with macros, hell most of what I learnt was literally basic or Turbo Pascal lol. Totally forgot about chatgpt thanks!
In my befuzzled brain that does look right? Had to step outside as I had missed way too much sun trying to stop us from being seriously questioned by the investor if this is automatically sent out and not manually checked cos people can't fill in the right boxes and we insulted an entire ethnicity and various other similar problems.
 
I think the way I’d do this is using a vba dictionary object, which is a keyed array (like eg a hash in Perl). Loop through the rows creating a dictionary element keyed
“NameDate”, if one exists already mark the current row as duplicate and continue.

E2a: this is the dictionary object and its methods: Dictionary object
 
I am very basic with macros, hell most of what I learnt was literally basic or Turbo Pascal lol. Totally forgot about chatgpt thanks!
In my befuzzled brain that does look right? Had to step outside as I had missed way too much sun trying to stop us from being seriously questioned by the investor if this is automatically sent out and not manually checked cos people can't fill in the right boxes and we insulted an entire ethnicity and various other similar problems.

Oh, I can't do it at all, but I can probably bugfix the code that ChatGPT writes for me... that said, the exact prompt is important soooo based on the next post

That's basically all VBA is.



That would only appear to work if the duplicates are on adjacent rows. :hmm:

I changed the prompt to:

Write me an excel macro that will write "Duplicate" in column C if both Column A and Column B are duplicates of another row's column A and Column B entry. The other row should be any other row in the sheet not just adjacent rows

and got thi sback instead:

Sub MarkDuplicatesInColumnC()
Dim ws As Worksheet
Dim lastRow As Long
Dim i As Long, j As Long

' Set the worksheet where you want to mark duplicates
Set ws = ThisWorkbook.Sheets("Sheet1") ' Change "Sheet1" to your sheet's name

' Find the last row with data in column A
lastRow = ws.Cells(ws.Rows.Count, "A").End(xlUp).Row

' Loop through each row
For i = 2 To lastRow
' Check if this row has already been marked as a duplicate
If ws.Cells(i, 3).Value <> "Duplicate" Then
' Loop through all rows in the worksheet
For j = 2 To lastRow
' Skip the current row
If j <> i Then
' Check if both column A and column B are duplicates
If ws.Cells(i, 1).Value = ws.Cells(j, 1).Value And ws.Cells(i, 2).Value = ws.Cells(j, 2).Value Then
' Mark both rows as "Duplicate"
ws.Cells(i, 3).Value = "Duplicate"
ws.Cells(j, 3).Value = "Duplicate"
End If
End If
Next j
End If
Next i
End Sub

Now the question is whether that's any different to the one it spat out before!
Anyway, you can try it and see if it works and if not maybe see how it fails and edit it to make it work, or make the prompt more precise/different to get what you need.
 
I have a table, with a bunch of personal and business info in, sometimes people asked the same thing on the same day and this has to be reported as a single incident but cannot be removed. So adding another column to flag this.

So we have [Name] and [Date] as column headers to look at, if they both match then I want to return well anything at all really to [Duplicate] so I can filter this on a pivot table.

I have come across using countif, xlookup and filter for this but am screwing up somewhere and its driving me bonkers, I am not even supposed to be working today.
Anyone?
Late to the party


Quick and dirty use countifs()

First you want a helper column in a new column in the table put
=[@name}&"-"&[@date]
the will combine the dates and names into one cell. Name the column "helper"

then you want to count for repeats
In another column ([duplicate?]"
=countifs([helper],[@helper])


you could combine the above into 1 column for neatness if you want

Now filter on this count column. - anything with a count of >1 will be repeats

you could put all of this into a separate table using the filter() function (assuming you are on Office 365) - but the above should do for you

or, Guessing you want to count all the individual incidents - so you can enter somewhere else =countifs([name],[duplicate?],1) : which should chuck out a count of all names where the duplicate count = 1


note sumifs, countifs etc are basically the same as countif and sumif but they let you choose more than one condition. however, they put the range to be summed/counted at the start of the formula and not the end. Ignore sumif and countif: instead always use the plural form, then you don't have to worry about the differing syntax order
 
I have a table, with a bunch of personal and business info in, sometimes people asked the same thing on the same day and this has to be reported as a single incident but cannot be removed. So adding another column to flag this.

So we have [Name] and [Date] as column headers to look at, if they both match then I want to return well anything at all really to [Duplicate] so I can filter this on a pivot table.

I have come across using countif, xlookup and filter for this but am screwing up somewhere and its driving me bonkers, I am not even supposed to be working today.
Anyone?
There is an option to highlight duplicate values in a column. (Not got excel open in front of me to tell you exact menu option).

I'm not sure if that option works on more than one column but if not then you could do a formula in a third column to join the values of the first 2 columns and then do the highlight duplicates thing on that column.

If i have time at work i'll have a go and see if it works.
 
Select a column then on "home" tab there is an icon "conditional formatting". Then select "highlight cells rules" and then "duplicate values" to highlight the duplicates in that column.

To combine 2 columns values in a third column then if first 2 values are in col a and col b and start at row 2 (i'm assumming headings in row 1) then in cell c2 you type =a2&" "&b2 (i put a space in middle)

Then copy that cell and paste into rest of col c

Then you can do the highlight duplicates on col c

You might need to hide the joined up column to stop people typing in it and fucking up the formulas and then unhide it when you need it.
 
Ok so if col a is name and col b is date then formula in row 2 of col c should be

=a2&" "&text(b2,"dd/mm/yy")
 
My excell ninja days are a few years back. I'd probably go for countif but with multiple fields I think you are pushing excel to do something that access or another database would be more appropriate for.
Yeah, this. It's a classic database use case. And, of course, another classic thing is people using spreadsheets where databases would be more appropriate.

There will be some kind of incantation you could use to achieve the same thing in a spreadsheet, but it will be a) complex, b) slow, c) unmaintainable.
 
Something like this in C2 should do it =if(countifs(A$2:A2,A2,B$2:B2,B2) >1,"Duplicate","Not a duplicate") should do it (and drag down for all rows)
 
Something like this in C2 should do it =if(countifs(A$2:A2,A2,B$2:B2,B2) >1,"Duplicate","Not a duplicate") should do it (and drag down for all rows)

Thanks all, unfortunately I now have a new problem. Made a helper column with concatenate, duplicate detection worked great, unfortunately I can't filter on it as if there is a 3 duplicate then it flags all of them including the first one as duplicates. So filtering them on the pivot removes all entries but I need to report 1 of them.

Need to mark one of them somehow the original, then the rest as duplicates.
 

Thanks all, unfortunately I now have a new problem. Made a helper column with concatenate, duplicate detection worked great, unfortunately I can't filter on it as if there is a 3 duplicate then it flags all of them including the first one as duplicates. So filtering them on the pivot removes all entries but I need to report 1 of them.

Need to mark one of them somehow the original, then the rest as duplicates.
I think the formula has an error...the >1 should be ">1" (I.e. surrounded by double quotes). Then filtering on 'not a duplicate' should give you only the originals.

Make sure the $'s are in the right place, it's crucial for it to work
 
Currently I have

=if(countif(dg$2:DG4000,DG2)>1,"Duplicate",DG2)

DG holds the concatenated id, formula is in DH
I know I should be using countifs but its newish to me and this at least produced something closer

I can't work out what the A column in your countif is for? That appears to be aiming at the column header?
 
This is such a ridiculous pain in the ass but I now have a macro to take the data when its imported from the new file and the tables appended, fix all the formatting issues and date columns, then add the helper column and duplicate detection.

Is there a way to automate getting new tables from another workbook, selecting the 6 ones I want and appending them in place of the current data? So far its very manual which works fine for me and I documented it. I don't think its likely whoever is supposed to be using this is going to be all that good at it, being as they were pasting everything and counting manually with a filter.
 
Back
Top Bottom