II. Background

  1. History
    1. Started in 1986 at the National Library of Medicine
  2. Description
    1. Library of mappings of clinical terms and codes to various clinical vocabularies, as well as their organization within a hierarchical tree
    2. Free for use within the U.S.
  3. Features
    1. Concept Unique Identifiers (CUI) codes are assigned to each unique concept
      1. Concept codes only exist if they are present in one or more vocabularies
      2. UMLS does not maintain its own hierarchical tree (aside from the trees described by the individual vocabularies)
    2. Concept codes are mapped extensively to other objects
      1. Concept codes are mapped to normalized synonyms, translations, descriptions and definitions
      2. Concepts codes are mapped to other vocabularies (e.g. SNOMED CT)

III. Types: UMLS Components

  1. Metathesaurus
    1. Cross-mapping of clinical terms and codes across many vocabularies (SNOMED CT, MESH, ICD-10, RxNorm, LOINC)
  2. Semantic network
    1. Organizational tree structure of concepts
  3. Specialist Lexicon
    1. Natural Language processing tools

IV. Approach: Installing UMLS (on Windows)

  1. Background
    1. This is what I do to install UMLS on Windows in MS SQL Server
    2. Note that this is a time consuming process (days) that I only perform every 2-3 years
  2. Obtain UMLS License
    1. http://www.nlm.nih.gov/databases/umls.html
  3. Download UMLS
    1. http://www.nlm.nih.gov/research/umls/licensedcontent/umlsknowledgesources.html
    2. Use the teminology_download_script
      1. Edit the curl-uts-download.bat (based on the README.txt
    3. Create a powershell file to download all files from the above URL
      1. Example: c:\umls\curl-uts-download.bat "


    4. Total download size is <4 GB in size and downloads in a couple of hours with a fast connection
  4. Create a subset of the UMLS data using Metamorphosys
    1. Unzip the mmsys.zip file
    2. Run Metamorphosys (either run.bat or run64.bat)
    3. Select data sources to include (default is that excluded sources are highlighted)
    4. Select output directory and other options and click the Done button
    5. Subsetting will take several hours to run
  5. Import the data into MySql
    1. Prefer to import into MS SqlServer, but UMLS import scripts exist only for MySql and Oracle
      1. I have migrated UMLS from MySql to SqlServer but this is a drawn out process
      2. Better approach is to access UMLS MySql database directly from Visual Studio
    2. Start MySql Workbench
      1. Create a new connection (e.g. localinstance) if not already created
      2. http://dev.mysql.com/doc/workbench/en/wb-getting-started-tutorial-create-connection.html
    3. UMLS MySql database instructions
      1. http://www.nlm.nih.gov/research/umls/implementation_resources/scripts/index.html
      2. Create a new database schema for umls (e.g. umls2014)
      3. Modify the my.ini performance settings for the database (in e:\ProgramData\MySql\MySql Server)
      4. Stop and Start the MySql server in task manager services to have my.ini changes take affect
      5. Navigate to the META output directory (e.g. output\2014AB\META)
    4. Scripting the database creation and indexing
      1. Option 1: Batch file
        1. Edit the "populate_mysql_db.bat" file
        2. Double click the modified bat file (or run in powershell)
          1. Generates and populates the database and applies database table indexes
        3. This will run for a very long time and with minimal to no screen output
          1. Will output to mysql.log
      2. Option 2: Import the SQL file into MySQL workbench (backup plan)
        1. I resorted to this, as could not get the batch file to work
        2. Load/Run the SQL table creation file into MySQL workbench (umls database)
          1. Edit the file to include the full file paths (escaped with \\)
        3. Load/Run the SQL table index file into MySQL workbench (umls database)
          1. Also edit the file to include the full file paths (escaped with \\)
  6. Set up MySql to use with Visual Studio
    1. Install MySql for Visual Studio and .Net Connector
      1. http://www.mysql.com/why-mysql/windows/visualstudio/
    2. Create a Data connection to the umls MySql database
    3. Add Entity Framework to the current project using package manager console
      1. As of 2014, MySql Connector only worked with Entity Framework 5.0
    4. Add an ADO data entity to the project using the umls data connection
    5. Create a second database inside SqlServer for output of transforming the UMLS data
    6. MRXNW_ENG will need to be copied using SQL Server Migration Assistant for MySql
      1. Entity Framework will not connect to this table (due to no primary fields and no-unique combinations)

V. Resources

  1. NIH National Library of Medicine UMLS
    1. http://www.nlm.nih.gov/research/umls/
  2. UMLS Terminology Services (data source)
    1. https://uts.nlm.nih.gov/home.html

Images: Related links to external sites (from Bing)

Related Studies