"how to join two pandas dataframes on trailing part of path / filename" Code Answer

7

You can use a merge with a regex using str.extract to extract the end-of-line anchored part of the path:

import re

pattern = f"({'|'.join(df2['F_NAME'].apply(re.escape))})$"

df3 = df1.merge(df2, left_on=df1['PATH'].str.extract(pattern, expand=False),
                right_on='F_NAME', how='left')

Output:

                    PATH  VALUE            F_NAME VALUE_X CORDS
0    C:FODLERTest1.jpg     45  FODLERTest1.jpg      12     1
1  C:AFODLERTest2.jpg     23  FODLERTest2.jpg      25     2
2  C:AFODLERTest3.jpg     45  FODLERTest3.jpg      33     4
3  C:AFODLERTest4.jpg      2  FODLERTest4.jpg     123     5

pattern:

(FODLER\Test1.jpg|FODLER\Test2.jpg|FODLER\Test6.jpg|FODLER\Test3.jpg|FODLER\Test4.jpg|FODLER\Test9.jpg)$

regex demo


Alternatively, if the PATH only has 2 components (folderfilename.ext), you can assign a column with the trailing part of the path before merging:

df3 = (df1
    .assign(F_NAME=df1['PATH'].str.extract(r'([^\]+\[^\]+)$', expand=False))
    .merge(df2, how='left')
)

regex demo

By Jimbot on May 26 2023

Answers related to “how to join two pandas dataframes on trailing part of path / filename”

Only authorized users can answer the search term. Please sign in first, or register a free account.