You don't need to be an 'investor' to invest in Singletrack: 6 days left: 95% of target - Find out more
(just sexing up the forum for our Guardian readers 😀 )
I have a long spreadsheet that has a number of columns, one of which contains an identifier number code which is unique to an individual person
Each number may appear any number of times within the sheet
I need to anonymise the data so want to assign new, unrelated values instead of each identifier but same one every time it appears in the sheet. I can't see how I now automatically assign new identifers (I'm sure there must be a way of numbering, maybe sequentially) I haven't sorted the data by this number but I suspect I need to.
so if I have 2 students, 123 and 234 who took tests A,B,C, they might have
pass 123 test A
fail 234 test A
fail 123 test B
fail 234 test B
pass 123 test C
pass 234 test C
I want it to read:
pass 1 test A
fail 2 test A
fail 1 test B
fail 2 test B
pass 1 test C
pass 2 test C
(it's not possible to truncate their identifier just to the 1st digit - that's just how my example works. The real ones have almost random 4-7 digit numbers)
Anyone ?? 🙁
ooh will I be fast enough. Hang on
Get a list of all the identifiers and put them in the first column on a new spreadsheet. In the next column put the identifier you want to replace it with.
On the original spreadhseet, add a column to the right of the one with the identifiers. Use the Vlookup function to search for the identifier in the list on sheet 2 and bring back the value in the second column.
This is the formula you need:
=VLOOKUP(Sheet1!E1,Sheet2!A1:B2,2,FALSE)
I can't - or rather I'd have to manually delete several thousand duplicates to get itGet a list of all the identifiers
WAIT !!!!
(I know how to deduplicate; that'll do it, hey ?!)
I'm sure there's an easy way to deduplicate; but if there isn't then just do it the more complex way:
Put all the raw identifiers in the first column of a sheet.
Sort them so like values are next to each other.
In the second column, add a formula to compare each value to the one above it. If it's the same return 1; if different then return 0. Copy n paste that formula down the whole column.
Voila, the first instance of each identifier is marked with a 1. Filter the list to show only zeros. |Select all the visible rows and delete them. Then take the filter off again and you have a deduped list.
dedup's easy* - thanks thegen !
(* in 2007 anyway, just a button on the menu bar)
Right. I've tidied it up a bit:
These are the entries in the first four columns.
pass| 123| =VLOOKUP(B1,Sheet2!A$1:B$2,2,FALSE)| test a
fail| 234| =VLOOKUP(B2,Sheet2!A$1:B$2,2,FALSE)| test a
fail| 123| =VLOOKUP(B3,Sheet2!A$1:B$2,2,FALSE)| test b
Assuming this data exists on the second tab:
123| bob
234| bill
Then the first tab will actually display:
pass| 123| bob| test a
fail| 234| bill| test a
fail| 123| bob| test b
All done - thanks
What the hell are you on about? 😕
