I have two dataframes. I want to use elements from one dataframe to search through a column from the other dataframe. And I need to narrow down this dataframe by the matches. And then continue narrowing down element by element. Look to the sample code, which can explain better.
df1 col1 1 apples 2 oranges 3 apples 4 banana 5 grapes6 mangoes7 oranges8 banana
df1 has only one column in it. Meanwhile df2 has 2 columns in it. setID & col1
df2 setID col11 1 apples 2 1 oranges 3 1 oranges4 1 mangoes5 1 grapes6 1 banana 7 1 banana8 1 apples 10 2 apples 11 2 oranges 12 2 apples 13 2 banana 14 2 grapes15 2 mangoes16 2 banana17 2 oranges18 3 apples 19 3 banana 20 3 oranges 21 3 apples 22 3 grapes23 3 mangoes24 3 oranges25 3 banana26 4 apples 27 4 oranges 28 4 apples 29 4 grapes30 4 grapes31 4 oranges 32 4 banana 33 4 banana
As you can see there are some repeating setIDs. They mark one set. The order of the set is important. Please note that the df1$col1 does not have to be the same length as a set from df2. Nor do they have to be an exact match. They just have to be a close enough match. In this case df1$col1 is closest a match to df2$setID = 2 with only the last two elements out of order. The reason why they dont have to be an exact match is because I want to use a "search as you type" approach. I do not want to match df1$col1 as it is to a setID on df2. I want to narrow down the possible set by going through element by element. Assume that you get the elements of df1 one by one and not as a complete dataframe. For example:
Find a match for df1$col1[1] from df2 and save any sets that contains the match to a tempdf. It doesnt matter if a match for df1$col1[1] is found more than once in the same set. If it is found at least once then that set will be added to tempdf.
What needs to be retrieved at the end is a setID that corresponds to the set that matches as close to df1. In this case the tempdf will be the same as df2 as all the sets include "apples". Next will be what matches df1$col1[2] against the tempdf given that the first element is a match. I guess df1$col1[1:2] from tempdf. This results in:
tempdf setID col11 1 apples 2 1 oranges 3 1 oranges4 1 mangoes5 1 grapes6 1 banana 7 1 banana8 1 apples 10 2 apples 11 2 oranges 12 2 apples 13 2 banana 14 2 grapes15 2 mangoes16 2 banana17 2 oranges26 4 apples 27 4 oranges 28 4 apples 29 4 grapes30 4 grapes31 4 oranges 32 4 banana 33 4 banana
Basically setID = 3 is omitted. As this continues with the 3rd element from df1 the new tempdf will contain only setID 2 & 4. The loop (my thinking to solve this) would end once only one setID remains, in this case setID = 2. Therefore setID = 2 would be considered as a close match for df1.
Of course feel free to advice on a better approach than this one.