II. Background
- History
- Started in 1986 at the National Library of Medicine
- Description
- Library of mappings of clinical terms and codes to various clinical vocabularies, as well as their organization within a hierarchical tree
- Free for use within the U.S.
- Features
- Concept Unique Identifiers (CUI) codes are assigned to each unique concept
- Concept codes only exist if they are present in one or more vocabularies
- UMLS does not maintain its own hierarchical tree (aside from the trees described by the individual vocabularies)
- Concept codes are mapped extensively to other objects
- Concept codes are mapped to normalized synonyms, translations, descriptions and definitions
- Concepts codes are mapped to other vocabularies (e.g. SNOMED CT)
- Concept Unique Identifiers (CUI) codes are assigned to each unique concept
III. Types: UMLS Components
- Metathesaurus
- Cross-mapping of clinical terms and codes across many vocabularies (SNOMED CT, MESH, ICD-10, RxNorm, LOINC)
- Semantic network
- Organizational tree structure of concepts
- Specialist Lexicon
- Natural Language processing tools
IV. Approach: Installing UMLS (on Windows)
- Background
- This is what I do to install UMLS on Windows in MS SQL Server
- Note that this is a time consuming process (days) that I only perform every 2-3 years
- Obtain UMLS License
- Download UMLS
- http://www.nlm.nih.gov/research/umls/licensedcontent/umlsknowledgesources.html
- Use the teminology_download_script
- Edit the curl-uts-download.bat (based on the README.txt
- Create a powershell file to download all files from the above URL
- Example: c:\umls\curl-uts-download.bat ""
- Total download size is <4 GB in size and downloads in a couple of hours with a fast connection
- Create a subset of the UMLS data using Metamorphosys
- Unzip the mmsys.zip file
- Run Metamorphosys (either run.bat or run64.bat)
- Select data sources to include (default is that excluded sources are highlighted)
- Select output directory and other options and click the Done button
- Subsetting will take several hours to run
- Import the data into MySql
- Prefer to import into MS SqlServer, but UMLS import scripts exist only for MySql and Oracle
- I have migrated UMLS from MySql to SqlServer but this is a drawn out process
- Better approach is to access UMLS MySql database directly from Visual Studio
- Start MySql Workbench
- Create a new connection (e.g. localinstance) if not already created
- http://dev.mysql.com/doc/workbench/en/wb-getting-started-tutorial-create-connection.html
- UMLS MySql database instructions
- http://www.nlm.nih.gov/research/umls/implementation_resources/scripts/index.html
- Create a new database schema for umls (e.g. umls2014)
- Modify the my.ini performance settings for the database (in e:\ProgramData\MySql\MySql Server)
- Stop and Start the MySql server in task manager services to have my.ini changes take affect
- Navigate to the META output directory (e.g. output\2014AB\META)
- Scripting the database creation and indexing
- Option 1: Batch file
- Edit the "populate_mysql_db.bat" file
- Double click the modified bat file (or run in powershell)
- Generates and populates the database and applies database table indexes
- This will run for a very long time and with minimal to no screen output
- Will output to mysql.log
- Option 2: Import the SQL file into MySQL workbench (backup plan)
- I resorted to this, as could not get the batch file to work
- Load/Run the SQL table creation file into MySQL workbench (umls database)
- Edit the file to include the full file paths (escaped with \\)
- Load/Run the SQL table index file into MySQL workbench (umls database)
- Also edit the file to include the full file paths (escaped with \\)
- Option 1: Batch file
- Prefer to import into MS SqlServer, but UMLS import scripts exist only for MySql and Oracle
- Set up MySql to use with Visual Studio
- Install MySql for Visual Studio and .Net Connector
- Create a Data connection to the umls MySql database
- Add Entity Framework to the current project using package manager console
- As of 2014, MySql Connector only worked with Entity Framework 5.0
- Add an ADO data entity to the project using the umls data connection
- Create a second database inside SqlServer for output of transforming the UMLS data
- MRXNW_ENG will need to be copied using SQL Server Migration Assistant for MySql
- Entity Framework will not connect to this table (due to no primary fields and no-unique combinations)
V. Resources
- NIH National Library of Medicine UMLS
- UMLS Terminology Services (data source)