II. Background
- History
- Started in 1986 at the National Library of Medicine
- Description
- Library of mappings of clinical terms and codes to various clinical vocabularies, as well as their organization within a hierarchical tree
- Free for use within the U.S.
- Features
- Concept Unique Identifiers (CUI) codes are assigned to each unique concept
- Concept codes only exist if they are present in one or more vocabularies
- UMLS does not maintain its own hierarchical tree (aside from the trees described by the individual vocabularies)
- Concept codes are mapped extensively to other objects
- Concept codes are mapped to normalized synonyms, translations, descriptions and definitions
- Concepts codes are mapped to other vocabularies (e.g. SNOMED CT)
- Concept Unique Identifiers (CUI) codes are assigned to each unique concept
III. Types: UMLS Components
- Metathesaurus
- Cross-mapping of clinical terms and codes across many vocabularies (SNOMED CT, MESH, ICD-10, RxNorm, LOINC)
- Semantic network
- Organizational tree structure of concepts
- Specialist Lexicon
- Natural Language processing tools
IV. Approach: Installing UMLS (on Windows)
- Background
- This is what I do to install UMLS on Windows in MS SQL Server
- I use SQL Server Developer Edition (free)
- This previously took considerable time, but MSSql Scripts have helped
- Obtain UMLS License
- Download UMLS
- http://www.nlm.nih.gov/research/umls/licensedcontent/umlsknowledgesources.html
- Use the teminology_download_script
- Edit the curl-uts-download.bat (based on the README.txt
- Create a powershell file to download all files from the above URL
- Example: c:\umls\curl-uts-download.bat ""
- Total download size is <4 GB in size and downloads in a couple of hours with a fast connection
- Create a subset of the UMLS data using Metamorphosys
- Unzip the mmsys.zip file
- Run Metamorphosys (either run.bat or run64.bat)
- Select data sources to include (default is that excluded sources are highlighted)
- Select output directory and other options and click the Done button
- Subsetting will take several hours to run
-
MS-SQL database
- Create a database in MS-SQL and save mdb to drive with considerable space (starting db is >10 gb)
- Use the scripts from Victor Castro on Github (thank you Victor!!)
-
MS-SQL Views
- vMrConsoDistinctEnglishTerms
- SELECT CUI, STR, count_big(*) as nTerms FROM dbo.MRCONSO where (ISPREF='Y') AND (LAT='ENG') Group by CUI, STR
- vMrConsoDistinctEnglishTermsSingle
- SELECT a.CUI, MIN(a.STR) as str FROM dbo.vMrConsoDistinctEnglishTerms AS a INNER JOIN
- (SELECT CUI, MAX(nTerms) as maxTerms FROM vMrConsoDistinctEnglishTerms GROUP BY CUI) as b
- ON a.CUI = b.CUI AND a.nTerms=b.maxTerms
- GROUP BY a.CUI
- MRCONSO_ENG_FPN (create table from view)
- Select * into dbo.MRCONSO_ENG_FPN from dbo.vMrConsoDistinctEnglishTerms
- MRCONSO_ENG_FPN_Distinct (table from view)
- Select * into dbo.MRCONSO_ENG_FPN_Distinct from dbo.vMrConsoDistinctEnglishTermsSingle
- vMrHier_EngStr
- SELECT h.CUI, h.AUI, h.CXN, h.PAUI, h.SAB, h.RELA, h.PTR, h.HCD, h.CVF, c.str
- FROM dbo.MRHIER AS h INNER JOIN
- dbo.MRCONSO_ENG_FPN_Distinct AS c ON h.CUI = c.cui
- vMrMap_EngStr
- SELECT m.MAPSETCUI, m.MAPSETSAB, m.MAPSUBSETID, m.MAPRANK, m.MAPID, m.MAPSID, m.FROMID, m.FROMSID, m.FROMEXPR, m.FROMTYPE, m.FROMRULE, m.FROMRES, m.REL, m.RELA, m.TOID, m.TOSID, m.TOEXPR, m.TOTYPE, m.TORULE, m.TORES, m.MAPRULE, m.MAPRES, m.MAPTYPE, m.MAPATN, m.MAPATV, m.CVF, c2.str AS FROMID_str
- FROM dbo.MRMAP AS m LEFT OUTER JOIN
- dbo.MRCONSO_ENG_FPN_Distinct AS c2 ON m.FROMID = c2.cui
- vMrRel_EngStr
- SELECT r.CUI1, r.AUI1, r.STYPE1, r.REL, r.CUI2, r.AUI2, r.STYPE2, r.RELA, r.RUI, r.SRUI, r.SAB, r.SL, r.RG, r.DIR, r.SUPPRESS, r.CVF, c1.str AS str1, c2.str AS str2
- FROM dbo.MRREL AS r INNER JOIN
- dbo.MRCONSO_ENG_FPN_Distinct AS c1 ON r.CUI1 = c1.cui INNER JOIN
- dbo.MRCONSO_ENG_FPN_Distinct AS c2 ON r.CUI2 = c2.cui
- vMrSat_EngStr
- SELECT s.CUI, s.LUI, s.SUI, s.METAUI, s.STYPE, s.CODE, s.ATUI, s.SATUI, s.ATN, s.SAB, s.ATV, s.SUPPRESS, s.CVF, c1.str
- FROM dbo.MRSAT AS s INNER JOIN
- dbo.MRCONSO_ENG_FPN_Distinct AS c1 ON s.CUI = c1.cui
- vMrSty_EngStr
- SELECT s.CUI, s.TUI, s.STN, s.STY, s.ATUI, s.CVF, c1.str
- FROM dbo.MRSTY AS s INNER JOIN
- dbo.MRCONSO_ENG_FPN_Distinct AS c1 ON s.CUI = c1.cui
- vMrConsoDistinctEnglishTerms
V. Resources
- NIH National Library of Medicine UMLS
- UMLS Terminology Services (data source)