Downloads Documentation Community Contribute Demo






Show Sidebar
Login | Register

root/openmrs-modules/patientmatching/doc/readme.txt

Revision 2143, 5.1 kB (checked in by scentel, 1 year ago)

In patient matching module, updated documentation for new XML config format

Line 
1 CONTENTS
2         PATIENT MATCHING ATTRIBUTE - Describes format of the PersonAttributeType the modules uses
3        
4         STRING COMPARATORS - Lists the different comparators available when matching
5        
6         CONFIGURATION FILE - Describes requirements of the configuration file
7        
8         CONFIGURATION FILE TAGS - Explains the elements in the configuration file
9
10
11 PATIENT MATCHING ATTRIBUTE
12 The module prefers to use a special matching PersonAttributeType of "Other Matching Information".  This is a list of demographic-value pairs in the form of "<demographic1>:<value1>;<demographic2>:<value2>; . . . ".  If a demographic has no value, then it can either not be present in the string of the value can be an empty string.
13
14 If there is no person attribute of that type, then the module will try to get some basic information as best it can.  Currently, this is very minimal and would not make good matches.
15
16 STRING COMPARATORS
17 The string comparators that can be used are:
18 Exact Match - case sensitive comparison for the whole string, similarity is iether 0 or 1
19 Levenshtein - Levenshtein edit distance / longest string length
20 Longest Common Substring - Regenstrief algorithm, converts to case insensitive strings in comparison
21 Jaro Winkler -
22
23 The implementations for Levenshtein and Jaro Winkler comparators come from the Simmetrics library at http://www.dcs.shef.ac.uk/~sam/simmetrics.html.  The threshold for Jaro Winkler
24 and Longest Common Substring is a score 0.8.  The threhold for Levenshtein is 0.7.
25
26
27 CONFIGURATION FILE
28 The default name for the configuration file is "link_config.xml" in the current working directory.  For an OpenMRS module, this would be the Tomcat directory, such as "C:\Program Files\Apache Software Foundation\Tomcat 6.0" on Windows.  The JDBC driver needs to be in the classpath when the program is run if the link table is in a non Postgres or MySQL directory.
29
30 A excerpt of a valid configuration file is:
31 <?xml version="1.0" encoding="UTF-8" ?>
32 <Session>
33         <datasource name="link_test" type="DataBase" access="<JDBC driver>,<database URL>,<user>,<passwd>" id="3">
34                 <column include_position="0" column_id="mrn" label="mrn" type="string"/>
35                 <column include_position="1" column_id="ln" label="ln" type="string"/>
36                 . . .
37                 <column include_position="17" column_id="openmrs_id" label="openmrs_id" type="string"/>
38         </datasource>
39         <analysis type="scaleweight">
40         <init>DBCdriver,databaseURL,user,passwd</init>
41         </analysis>
42         <run estimate="true" name="conversion">
43                 <row name="yb">
44                         <BlockOrder>1</BlockOrder>
45                         <BlckChars>40</BlckChars>
46                         <Include>false</Include>
47                         <TAgreement>0.9</TAgreement>
48                         <NonAgreement>0.1</NonAgreement>
49                         <ScaleWeight lookup="TopN" N="100.0" buffer="500">true</ScaleWeight>
50                         <Algorithm>Exact Match</Algorithm>
51                 </row>
52                 . . .
53                 <row name="zip">
54                         <BlockOrder>null</BlockOrder>
55                         <BlckChars>40</BlckChars>
56                         <Include>true</Include>
57                         <TAgreement>0.9</TAgreement>
58                         <NonAgreement>0.1</NonAgreement>
59                         <ScaleWeight>null</ScaleWeight>
60                         <Algorithm>Exact Match</Algorithm>
61                 </row>
62         </run>
63 </Session>
64
65
66 CONFIGURATION FILE TAGS
67 The description of the elements and attributes of the xml configuration file is:
68 Session – the root element
69 Datasource – a source of Record objects
70         Name – for file sources, give the path, for data bases, gives the table name
71         Type – type of datasource: CharDelimFile, DataBase, Vector
72         Access – how to access the datasource.  For a character delimted file, it’s the delimiter.  For a database, it’s a String holding connection information
73         ID - a numeric unique identifier for the data source
74                 Column – one column of fields in the datasource
75                 Include_position – if column is a part of the analysis, what order it is.  Zero indexed
76                 Column_id – name of the column.  For a character delimited file, it’s an index.  For a database table, it’s the column name
77                 Label – the name used by the linkage program and that appears in the “run” section.  It should be the demographics that appear in the matching person attribute
78                 Type – either is “string” or “numeric” and used in sorting and comparisons
79 Run – a set of link options to use with the datasources
80         Estimate – Whether to use EM to modify values
81         Name – a label for this configuration
82 Row – the options for a field in the Record
83         Name – the name of the field, must match the label in the Datasource element
84 BlockOrder – if the field is a blocking field, then uniquely number this starting with 1
85 BlckChars – the number of characters to block on if the field is a blocking field
86 Include – indicates if the field will be compared between records
87 TAgreement – the true agreement value
88 NonAgreement – the non agreement value
89 ScaleWeight - true for enabling weight scaling, null for disabling
90         lookup - Determines the tokens that will be loaded to the lookup table. Possible values are: TopN, TopNPercent, AboveN, BelowN, BottomNPercent, BottomN
91         N - Defines the size of the lookup table, must be a decimal number, use a number between 0.0 and 1.0 for percentages
92         buffer - Number of records that will be stored in memory during analysis (no need to exceed the number of unique tokens)
93 Algorithm – the comparator to use for this field.  Options are Exact Match, LEV, LCS, and JWC
Note: See TracBrowser for help on using the browser.