Skip to content Skip to site navigation

BioDemo Data

With the exception of names, which have their own table and additional rules, the set of person matching data is managed in two tables -- matching_data and biodemo data. The matching_data table is used to record desirable matching information on a per-source basis. Indeed additional anticipated use of the matching_data table did not materialize, so it might be easier to think of matching_data as just the per-source repository of biodemo data. It's where the registry remembers which source contributed a specific SSN or birthdate, and when. The matching_data table therefore has values for both when a source agrees and when it disagrees with other sources. It's a valuable record to help resolve data conflicts among source systems.

Matching Data records both data essential to the automated person-matching process -- SSNs, birthdates and gender -- and a handful of biographical data that was thought could be of value in manual matching of people. This includes birth city, visa status and ethnic code. Finally, the "deceased" indicator is important added information that is factored into the management of services.

The biodemo table is where the registry maintains singular "best" values for this data. It is, to the extent the Registry and our business processes can guarantee, the correct information for that individual.

Types of BioDemo Data

matching_data type biodemo column Description
birthdate birthdate Date of birth. The value in the matching_data table is stored as a string as received from the source. The value stored in the biodemo table is stored as a date (smalldatetime).
birthplace birthplace Person's place of birth, populated primarily from student and HR systems. Value is, with few exceptions, just a city name. Value is kept as given by the source.
deceased deceased_ind Boolean indication of whether person is deceased.
ethnic ethnic_cd Coded ethnic group (we do not edit the code):

    1 -- White
    2 -- Black
    3 -- Hispanic
    4 -- Asian
    5 -- American Indian
    6 -- Unidentified

gender gender_cd "female", "male", "unknown"
ssn ssn Social security number in the form nnn-nn-nnnn. Partial values with only the last four digits are supported, in the form ###-##-nnnn.
us_citizenship us_citizenship_cd Coded ethnic group (we do not edit the code):

    1 -- US Citizen
    3 -- Perm Resident (alien)
    4 -- Intl, Temp (alien)
    N -- Not Indicated

visa_type visa_type_cd 2-character visa type code for foreign nationals (we do not edit the code). Sample codes (from about 35 possible values):

    PR -- Permanent Resident (Immigrant)
    OT -- Other (A1, A2, B1, B2, E1, F2, J2)
    J1 -- Exchange Visitor
    H1 -- Temp Worker: Specialty Occupation

Data integration rules

The biodemo table contains our "best" biodemo values from contributing sources. It is a single, required row per person containing a separate column for each of the above biodemo data types.

The data in this row is maintained on a per-column basis. So an SSN could come from one source and the birthdate from another. This can happen when neither source has a complete set of values.

Biodemo columns should be set from matching_data as follows. For each type of data:

  • If there is a matching data source_cd of "admin", then this value should be copied into biodemo if different than what's already there,
  • else update biodemo with the matching data value from the highest ranked source that has it (again, if different).
  • Data is never deleted from the Biodemo row, only updated with more authoritative values.
  • Any "admin" sourced row that is no longer needed to provide missing data or trump data from the highest ranking source should be deleted. This can be a maintenance procedure outside of Biodemo closeout.

Events

We currently do not post any biodemo related events because no biodemo values are passed to the directory. We will need to revisit this is we make this information selectively available through the document service.

Last modified December 6, 2019