II. Background

  1. History
    1. Started in 1986 at the National Library of Medicine
  2. Description
    1. Library of mappings of clinical terms and codes to various clinical vocabularies, as well as their organization within a hierarchical tree
    2. Free for use within the U.S.
  3. Features
    1. Concept Unique Identifiers (CUI) codes are assigned to each unique concept
      1. Concept codes only exist if they are present in one or more vocabularies
      2. UMLS does not maintain its own hierarchical tree (aside from the trees described by the individual vocabularies)
    2. Concept codes are mapped extensively to other objects
      1. Concept codes are mapped to normalized synonyms, translations, descriptions and definitions
      2. Concepts codes are mapped to other vocabularies (e.g. SNOMED CT)

III. Types: UMLS Components

  1. Metathesaurus
    1. Cross-mapping of clinical terms and codes across many vocabularies (SNOMED CT, MESH, ICD-10, RxNorm, LOINC)
  2. Semantic network
    1. Organizational tree structure of concepts
  3. Specialist Lexicon
    1. Natural Language processing tools

IV. Approach: Installing UMLS (on Windows)

  1. Background
    1. This is what I do to install UMLS on Windows in MS SQL Server
    2. I use SQL Server Developer Edition (free)
    3. This previously took considerable time, but MSSql Scripts have helped
  2. Obtain UMLS License
    1. http://www.nlm.nih.gov/databases/umls.html
  3. Download UMLS
    1. http://www.nlm.nih.gov/research/umls/licensedcontent/umlsknowledgesources.html
    2. Use the teminology_download_script
      1. Edit the curl-uts-download.bat (based on the README.txt
    3. Create a powershell file to download all files from the above URL
      1. Example: c:\umls\curl-uts-download.bat "

        "

    4. Total download size is <4 GB in size and downloads in a couple of hours with a fast connection
  4. Create a subset of the UMLS data using Metamorphosys
    1. Unzip the mmsys.zip file
    2. Run Metamorphosys (either run.bat or run64.bat)
    3. Select data sources to include (default is that excluded sources are highlighted)
    4. Select output directory and other options and click the Done button
    5. Subsetting will take several hours to run
  5. MS-SQL database
    1. Create a database in MS-SQL and save mdb to drive with considerable space (starting db is >10 gb)
    2. Use the scripts from Victor Castro on Github (thank you Victor!!)
      1. https://github.com/vcastro/umls_mssql_load
  6. MS-SQL Views
    1. vMrConsoDistinctEnglishTerms
      1. SELECT CUI, STR, count_big(*) as nTerms FROM dbo.MRCONSO where (ISPREF='Y') AND (LAT='ENG') Group by CUI, STR
    2. vMrConsoDistinctEnglishTermsSingle
      1. SELECT a.CUI, MIN(a.STR) as str FROM dbo.vMrConsoDistinctEnglishTerms AS a INNER JOIN
      2. (SELECT CUI, MAX(nTerms) as maxTerms FROM vMrConsoDistinctEnglishTerms GROUP BY CUI) as b
      3. ON a.CUI = b.CUI AND a.nTerms=b.maxTerms
      4. GROUP BY a.CUI
    3. MRCONSO_ENG_FPN (create table from view)
      1. Select * into dbo.MRCONSO_ENG_FPN from dbo.vMrConsoDistinctEnglishTerms
    4. MRCONSO_ENG_FPN_Distinct (table from view)
      1. Select * into dbo.MRCONSO_ENG_FPN_Distinct from dbo.vMrConsoDistinctEnglishTermsSingle
    5. vMrHier_EngStr
      1. SELECT h.CUI, h.AUI, h.CXN, h.PAUI, h.SAB, h.RELA, h.PTR, h.HCD, h.CVF, c.str
      2. FROM dbo.MRHIER AS h INNER JOIN
    6. dbo.MRCONSO_ENG_FPN_Distinct AS c ON h.CUI = c.cui
    7. vMrMap_EngStr
      1. SELECT m.MAPSETCUI, m.MAPSETSAB, m.MAPSUBSETID, m.MAPRANK, m.MAPID, m.MAPSID, m.FROMID, m.FROMSID, m.FROMEXPR, m.FROMTYPE, m.FROMRULE, m.FROMRES, m.REL, m.RELA, m.TOID, m.TOSID, m.TOEXPR, m.TOTYPE, m.TORULE, m.TORES, m.MAPRULE, m.MAPRES, m.MAPTYPE, m.MAPATN, m.MAPATV, m.CVF, c2.str AS FROMID_str
      2. FROM dbo.MRMAP AS m LEFT OUTER JOIN
    8. dbo.MRCONSO_ENG_FPN_Distinct AS c2 ON m.FROMID = c2.cui
    9. vMrRel_EngStr
      1. SELECT r.CUI1, r.AUI1, r.STYPE1, r.REL, r.CUI2, r.AUI2, r.STYPE2, r.RELA, r.RUI, r.SRUI, r.SAB, r.SL, r.RG, r.DIR, r.SUPPRESS, r.CVF, c1.str AS str1, c2.str AS str2
      2. FROM dbo.MRREL AS r INNER JOIN
    10. dbo.MRCONSO_ENG_FPN_Distinct AS c1 ON r.CUI1 = c1.cui INNER JOIN
    11. dbo.MRCONSO_ENG_FPN_Distinct AS c2 ON r.CUI2 = c2.cui
    12. vMrSat_EngStr
      1. SELECT s.CUI, s.LUI, s.SUI, s.METAUI, s.STYPE, s.CODE, s.ATUI, s.SATUI, s.ATN, s.SAB, s.ATV, s.SUPPRESS, s.CVF, c1.str
      2. FROM dbo.MRSAT AS s INNER JOIN
    13. dbo.MRCONSO_ENG_FPN_Distinct AS c1 ON s.CUI = c1.cui
    14. vMrSty_EngStr
      1. SELECT s.CUI, s.TUI, s.STN, s.STY, s.ATUI, s.CVF, c1.str
      2. FROM dbo.MRSTY AS s INNER JOIN
    15. dbo.MRCONSO_ENG_FPN_Distinct AS c1 ON s.CUI = c1.cui

V. Resources

  1. NIH National Library of Medicine UMLS
    1. http://www.nlm.nih.gov/research/umls/
  2. UMLS Terminology Services (data source)
    1. https://uts.nlm.nih.gov/home.html

Images: Related links to external sites (from Bing)

Related Studies