Downloads Documentation Community Contribute Demo






Show Sidebar
Login | Register

Ticket #897 (closed enhancement: fixed)

Opened 2 years ago

Last modified 6 months ago

Implement UID management in RecMatch GUI

Reported by: sgrannis Assigned to: nribeka
Priority: major Milestone:
Component: Patient Matching Module Keywords: RecMatch GUI UID management
Cc: jegg, nribeka Introductory Ticket: 0
Code Review Status:

Description

UID's important for de-duplication and Grouping

For patient matching, each distinct record must possess a field that uniquely identifies that particular record. This unique ID (uid) field is necessary for two chief functions: (1) to eliminate redundant pairs when deduplicating a data source (e.g., when linking a source against itself), and (2) to create a transitively related group of records that are believed to be the same entity.

Select UID Field

On the Data Session tab (Tab 1) the user should select the UID field (if already present in the data) by right-clicking on the column and selecting "UID field" from the right-click context menu. The GUI should accommodate the following 4 scenarios:

Flat file:

  1. UID field already present - user selects which field contains the UID so the system can use it
  2. No UID field - in this case a UID shall be added at the end of each flat file record after data sources are configured

Database:

  1. UID field already present - user selects which field contains the UID so the system can use it
  2. No UID field - cannot proceed; user is alerted that they must either select a UID (if present), or first add it to the database

For cases 1) and 3), the user is simply identifying the pre-existing field. In these cases an "id_field" attribute will be added to the <datasource> element of the config file. The "id_field" attribute points to the field name in the data source that contains the UID.

For case 2), a new auto-increment field shall be appended to the end of each record in the existing file; the field will be given an appropriate name (e.g., "uid"), and the "id_field" attribute will reference this newly created field. The UID appending code still needs to be written.

For case 4), the linkage process cannot proceed without a UID field being added to the database table. The user is warned that they must either select a UID (if present), or must first manually add it to the database and then select it on the data sources tab.

The following items will need to be changed:

  • GUI
  • ColumnSwitcher.class
  • XMLTranslator (to write the uid attribute in the datasource)
  • MatchingConfig (to let know the uid field)

Indicate De-duplication

Additionally, the user shall indicate whether the current linkage process is intended to de-duplicate a data source or to link two separate files. To indicate this, a check box labeled "de-depulicate" should be present on the data sources tab (Tab 1). When clicking this box, the program should automatically fill in datasource B with data source A information. We assume that the user configures data source A first, although they are not prohibited from starting with data source B. Perhaps we can eventually implement a mechanism to lock data source B configuration until data source A has been selected.

Clicking the "de-duplicate" check box shall alert FormPairs that de-duplication is occurring and the "eliminate redundant pairs using UID" rule is in effect. Thus, clicking the "de-duplicate" check box shall set a global configuration parameter, e.g. dedupe="true", in RecMatchConfig.

When de-duplicating, there will be fewer record pairs because of the "eliminate redundant pairs using UID" rule. For example, if a file having 3 records with UID's = {1,2,3} is linked to itself, then these are all the possible pairs:

{1,1},{1,2},{1,3} {2,1},{2,2},{2,3} {3,1},{3,2},{3,3}

However, {1,1}, {2,2} and {3,3} are pairs with the identical record. So if UIDa=UIDb, then discard.

Further, {2,1} == {1,2}, {3,2} == {2,3} and {3,1} == {1,3}. So if UIDa < UIDb, then discard.

We're left with the following unique pairs:

{2,1} {3,1},{3,2}

The combined rule is [if (UIDa <= UIDb) then discard] *but only in the case of de-duplication*, which is why we need the "de-duplicate" check box. The "de-deduplicate" check box preferably should go on the data source tab, which is when users will be thinking about de-dupes.

Explicitly Declare when Data Sources are Configured

Finally, we need to add a "Done" or "Data Source Configuration Complete" check box on the Data Sources tab. This will give an indication as to when the data sources have been completely specified. Until the user checks "done" they cannot leave the data sources tab. Initially the session and analysis panels are unavailable to the user, until they check "Done". At startup the user only has access to the data sessions menu.

When the user checks "Done", then tabs 2 and 3 become activated, and the data sources tab becomes locked. Once the user checks 'Done', if all is in order (e.g. proper UID status) then the switching/preprocessing can begin. The UID field status can be checked and the rules for UID scenarios 1) - 4) can be implemented after checking "Done". For example, if case 4), if 'Done' is checked, and no UID is selected, then a warning message can be displayed, saying effectively "A UID field must be present and selected."

The user can uncheck 'done', locking all tabs except 1.

Change History

08/12/09 15:40:32 changed by nribeka

  • status changed from new to closed.
  • intro_ticket changed.
  • resolution set to fixed.
  • review_status changed.

Code for the above workflow are in the module right now. Closing the ticket now.