- Previous topic: 6.3. Machine Readable Concept ModelNext topic: Importing SNOMED CT release dataDraft Guidance Terminology Service RF2 Page reference:
www.snomed.org/tig?t=tsg2_tsg

7. Terminology Services Guide

7.1. Representing SNOMED CT resources

7.1.1. Choosing a terminology server view

SNOMED CT Release Format 2 is designed to enable the distribution and use of a full historical view of SNOMED CT from its first release in 2002 up to its most recent release. This allows terminology servers to provide a range of different views of SNOMED CT. However, it does not require that all terminology servers support the full range of views.

Table 1 identifies three options for the views that a SNOMED CT terminology server may support. The simplest of these is the single snapshot view which provides access to a single release version. This closely matches the view provided by the original SNOMED CT release format ( RF1). The most powerful full view which allows the server to provide access to any selected version of SNOMED CT from a single representation of the SNOMED CT resource. This makes full use of the version features in RF2. Alternatively a server may provide a selected set of snapshots representing versions of known interest to its users.

People designing a terminology server need to decide whether their server will only provide access to a single current view of the SNOMED CT resource or will also support retrospective views of earlier versions of the terminology. The single snapshot view is simplest to implement and matches the service most vendors offered with original SNOMED CT release format ( RF1). A more complete view is now possible using Release format 2 and this offers several significant advantages. It supports incremental updates allowing smoother transition as new versions become available. It also allows changes between versions to be detected more easily and can be used to evaluate queries against an earlier version for comparative purposes.

People choosing a terminology server need to consider whether a server that only supports a single snapshot view of the current version meets their requirements. If they require access to previous versions a server that supports the full view is likely to be the best long term solution. A server that allows access to multiple discrete snapshots may provide a reasonable interim solution but may be less flexible and less easy to maintain.

Table 1. SNOMED CT views that may be supported by terminology servers

View

Description

Snapshot view

A snapshot view terminology service provides access to the content of the current state of all the components of the International Release and any chosen Extension Releases.

Multi- snapshot view

A "multi- snapshot view " terminology service provides access to:

A "multi- snapshot view " terminology server may provide access to delta views that report the differences between two snapshot views. This is limited to comparisons of specific points represented by the available snapshot views.

Full view

A Full view terminology service provides access to:

A full view terminology server should also provide access to views that show the changes to components between any two specified points in time.

The full view is required to support some SNOMED CT use cases but many requirements can be adequately met by providing access to a current Snapshot view. The multi- snapshot view is an approach that may meet some requirements that are not met by a single snapshot without requiring support for the Full view.

Note: terminology servers that do not support the Full view still need to be able to import from a Full release as Extension providers are not required to provide the snapshot or delta releases ( Importing release types)

7.1.2. Choosing a technical approach

People designing a terminology server need to decide how they will store and access the SNOMED CT resources. This decision depends on a variety of factors including: types of Terminology services required, the technical environment in which development is undertaken and the experience of the developers.

People choosing a terminology server need to know whether the server will meet their requirements and whether it works effectively in their preferred technical environment. They will also wish to be sure it delivers the required functionality and performance. While they may not be directly interested in technical approach to representation of SNOMED CT resources, these design decisions are likely to affect the ability of a server to meet their requirements.

The following sub-sections briefly outline some of the technical options.

7.1.2.1. Direct use of release files in a relational database

The distributed release files can be imported directly into a database schema that matches the distribution file specification. This data then provides the core resource at the heart of a terminology server.

This direct use of distributed files in a relational database has the advantage of allowing simple installation. However, it may not be the most efficient approach in terms of performance or file size. Some terminology services require relatively complex queries with multiple joins, and need to be completed in fractions of a second to provide an acceptable user interface.
Example: To display the set of subtype children of a concept with their preferred terms in a specified language or dialect requires joins several joins between concepts, Relationships, Descriptions and a language refset.

To search for a term matching a supplied pattern in a concept that represents a type of procedure also requires multiple joins to link the Descriptions with matching terms to the relevant concept and test whether it is a subtype of the 71388002 | Procedure (procedure) | concept.

The performance criteria of searches and joins in very large relational databases vary significantly. Therefore, different optimizations may need to be used to achieve acceptable response times according to the nature of the relational database system.

An additional consideration for RF2 implementations is the way in which alternative views are supported since, without optimization these may have a significant impact on performance.

7.1.2.2. Alternative relational structures

There is no requirement to use the data structure as distributed. Other structures can be used provided that they are able to deliver the range of Terminology services required. Options include:

  • Partially denormalized representations that omit direct representation of some components.
    Example: Frequently used information distributed as part of a Refset could be represented by direct inclusion of the added information as additional columns in the table representing the referenced component.
  • Omission of some of the tables where a particular function is not required.
    Example: The Refset tables representing cross maps could be omitted if the intended uses of the terminology server explicitly exclude cross mapping.
  • Replacement of some of the supporting tables with proprietary alternatives that deliver equivalent or enhanced functionality.
    Example: The word search support tables could be replaced by other tables or indices generated by the terminology server when loading the distribution files.

7.1.2.3. Non-relational structures

Although the primary distribution format is relational, this does not require terminology servers to utilize a relational database as the primary or only storage format. The requirements for terminology services may also be met by representing some or all of the distributed data in other forms including object-oriented databases, Extensible Mark-up Language (XML) and/or proprietary data structures. These structures may be used separately or, in some cases, in combination with a relational database.

7.1.3. Example of a Full View Relational Representation

This section outlines an example of a relational approach to representation of a full view of the SNOMED CT Resources. The example has been developed and tested using the Open Source database MySql Community Edition.

The example schema is based closely on the RF2 structure and is used in subsequent discussions of implementation issue and options for addressing those issues.
Note: The approach described here is only an illustrative example. It shows one way to represent the data but should not to be interpreted as a recommended or standard approach.
The general approach is as follows:
  • Each datatype in the RF2 specification is expressed with a common mapping to a database datatype:
    • Alternative implementations following the same general pattern could use a different datatype map but the mapping should be consistent within an implementation. Reasons for different datatype maps include implementer preferences and the capabilities of the database.
  • Each of the main file types specified in RF2 is instantiated as a database table:
    • Each table is named for the component type (e.g. sct2_Concept, sct2_Description, sct2_Relationship, sct2_Identifier).
    • Each field in these tables has column name from the release file
    • Each field is assigned the appropriate datatype (and where appropriate size).
  • Refsets are represented slightly differently from the other files:
    • One table structure for each distinct structure present in the release data:
      • der2_Refset.
      • der2_Reset_c.
      • der2_Refset_cc.
      • der2_Refset_ci.
      • der2_Refset_i.
      • der2_Refset_s.
      • der2_Refset_ss.
      • ... etc as new structures are added.
    • The first six fields in these tables have the common column names from the release file
    • The subsequent fields are named by type and position:
      • sctid1.
      • string1.
      • integer1.
      • ... etc.
    • This polymorphic field approach to column naming is used because column names may varies between release files for different Reference Set patterns, even when column data types are the same.
      Note: Two other approaches could be used here.
      1. A separate table for each type of Refset based on column names rather than on structure. This would require a several tables with similar types of Relationships to other components.
      2. A single general purpose Refset table with multiple polymorphic fields. For example, strings that could be used to represent the other data types. This could cause inefficiencies for sctid type fields as the joins between these and target components would be heterogeneous.

7.1.3.1. Example Datatype Mapping for Relational View

The following table provides example mapping from the SNOMED CT RF2 datatypes to appropriate datatypes supported by MySql.

Table 2. Example Datatype Mappings

RF2 Datatype

MySql Datatype

Comment on Mapping

SCTID

BIGINT

Both these datatypes represent 64-bit integers.

UUID

BINARY(16)

MySql does not have a native datatype for UUID. The BINARY(16) representation is most economical for storage and most efficient for indexing. This requires a transformation on storage or review. The example queries in this guide use the simple transformations functions shown in Table 3.

An alternative is to use CHAR or VARCHAR representations. This does not require the transformations noted above. However, use of VARCHAR (36) costs 38 bytes rather than 16 bytes per UUID and due to use of UTF8 using CHAR (36) consumes a fixed 108 bytes per UID in a MySql table. More importantly the index performance is poorer for these string representations.

Integer

INT

Both these datatypes represent 32-bit integers.

String

VARCHAR (Len)

VARCHAR is used in preference to CHAR as it provides more space efficient storage.

Note that in the UTF8 encoded tables required for the MyISAM database reserves three bytes per character for fixed length strings. In contrast VARCHAR uses the number of bytes actually plus one or two bytes to specify length. Use of VARCHAR does result in some loss of performance but strings are only used in Descriptions, string refsets and Identifier tables. In all these cases strings with a significant range of lengths are used and the space penalty for using CHAR datatypes would be high.

Boolean

TINYINT

MySql treats the datatype name boolean as an alias for TINYINT. In the examples this mapping is made explicit.

Time

DATETIME

This is the full representation of date and time and is used to ensure compatibility with existing data and potential accommodation of time stamped data.

The more compact DATE type could be used with current data as the effectiveTime is currently a date only representation. However, the more flexible DATETIME has been preferred in the examples because this emphasizes the fact that in an International environment the effectiveTime implies the UTC time and thus the date alone is not a precise representation.

Table 3. Example UUID transformation

Action

Load or insert to storage

SET [column-name] = UNHEX(REPLACE(@uid,'-',''))

Select from storage

RenderUid([column-name])

UNHEX

A built in MySql function that converts a hexadecimal string to binary.

RenderUid

FUNCTION `RenderUid`(Uid blob) RETURNS varchar(36) CHARSET utf8
BEGIN
    Set @Tmp = Hex(uid);
    RETURN CONCAT(SUBSTRING(@Tmp,1,8),'-',SUBSTRING(@Tmp,9,4),'-',
        SUBSTRING(@Tmp,13,4),'-',SUBSTRING(@Tmp,17,4),'-',SUBSTRING(@Tmp,21));
END

7.1.3.2. Example Full View Concept Table

CREATE TABLE `sct2_concept` (
  `id` BIGINT NOT NULL DEFAULT 0,
  `effectiveTime` DATETIME NOT NULL DEFAULT '0000-00-00 00:00:00',
  `active` TINYINT NOT NULL DEFAULT 0,
  `moduleId` BIGINT NOT NULL DEFAULT 0,
  `definitionStatusId` BIGINT NOT NULL DEFAULT 0,
  PRIMARY KEY (`id`,`effectiveTime`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
Figure 1. Create Concept Table
Tip: Some of the approaches to optimization suggested elsewhere in the guide result in changes to this example schema. You may wish to consider these before implementing this schema.
LOAD DATA LOCAL INFILE '[path]sct2_concept_[AdditionalInfo].txt' 
INTO TABLE `sct2_concept`
LINES TERMINATED BY '\r\n' IGNORE 1 LINES;
Figure 2. Import Concept file

7.1.3.3. Example Full View Description Table

CREATE TABLE `sct2_description` (
  `id` BIGINT NOT NULL DEFAULT 0,
  `effectiveTime` DATETIME NOT NULL DEFAULT '0000-00-00 00:00:00',
  `active` TINYINT NOT NULL DEFAULT 0,
  `moduleId` BIGINT NOT NULL DEFAULT 0,
  `conceptId` BIGINT NOT NULL DEFAULT 0,
  `languageCode` VARCHAR(3) NOT NULL DEFAULT '',
  `typeId` BIGINT NOT NULL DEFAULT 0,
  `Term` VARCHAR(255) NOT NULL DEFAULT '',
  `caseSignificanceId` BIGINT NOT NULL DEFAULT 0,
  PRIMARY KEY (`id`,`effectiveTime`),
  KEY `sct2_description_concept` (`conceptId`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
Figure 3. Create Description Table
Tip: Some of the approaches to optimization suggested elsewhere in the guide result in changes to this example schema. You may wish to consider these before implementing this schema.
LOAD DATA LOCAL INFILE '[path]sct2_description_[AdditionalInfo].txt' 
INTO TABLE sct2_description
LINES TERMINATED BY '\r\n' IGNORE 1 LINES;
Figure 4. Import Description file
CREATE INDEX ix_sct2_description_3 ON sct2_description([ConceptId],[typeId],[languageCode])
Figure 5. Index Description Table - Concept

7.1.3.4. Example Full View relationships table

CREATE TABLE `sct2_relationship` (
  `id` BIGINT NOT NULL DEFAULT 0,
  `effectiveTime` DATETIME NOT NULL DEFAULT '0000-00-00 00:00:00',
  `active` TINYINT NOT NULL DEFAULT 0,
  `moduleId` BIGINT NOT NULL DEFAULT 0,
  `sourceId` BIGINT NOT NULL DEFAULT 0,
  `destinationId` BIGINT NOT NULL DEFAULT 0,
  `relationshipGroup` INT NOT NULL DEFAULT 0,
  `typeId` BIGINT NOT NULL DEFAULT 0,
  `characteristicTypeId` BIGINT NOT NULL DEFAULT 0,
  `modifierId` BIGINT NOT NULL DEFAULT 0,
  PRIMARY KEY (`id`,`effectiveTime`),
  KEY `sct2_relationship_source` (`sourceId`,`characteristicTypeId`,`typeId`,`destinationId`),
  KEY `sct2_relationship_dest` (`destinationId`,`characteristicTypeId`,`typeId`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
Figure 6. Create relationships table
Tip: Some of the approaches to optimization suggested elsewhere in the guide result in changes to this example schema. You may wish to consider these before implementing this schema.
LOAD DATA LOCAL INFILE '[path]sct2_relationship_[AdditionalInfo].txt' 
INTO TABLE sct2_relationship
LINES TERMINATED BY '\r\n' IGNORE 1 LINES;
Figure 7. Import Relationship file

7.1.3.5. Example Full View Identifier Table

CREATE TABLE `sct2_identifier` (
  `identifierSchemeId` BIGINT NOT NULL DEFAULT 0,
  `alternateIdentifier` VARCHAR(255) NOT NULL DEFAULT '',
  `effectiveTime` DATETIME NOT NULL DEFAULT '0000-00-00 00:00:00',
  `active` TINYINT NOT NULL DEFAULT 0,
  `moduleId` BIGINT NOT NULL DEFAULT 0,
  `referencedComponentId` BIGINT NOT NULL DEFAULT 0,
  PRIMARY KEY (`identifierSchemeId`,`alternateIdentifier`,`effectiveTime`),
  KEY `sct2_relationship_sctid` (`referencedComponentId`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
Figure 8. Create Identifier Table
Tip: Some of the approaches to optimization suggested elsewhere in the guide result in changes to this example schema. You may wish to consider these before implementing this schema.
LOAD DATA LOCAL INFILE '[path]sct2_identifier_[AdditionalInfo].txt' 
INTO TABLE sct2_identifier
LINES TERMINATED BY '\r\n' IGNORE 1 LINES;
Figure 9. Index Identifier Table - Primary

7.1.3.6. Example Full View Refset Table

CREATE TABLE `sct2_refset_c` (
  `id` binary(16) NOT NULL DEFAULT '\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0',
  `effectiveTime` DATETIME NOT NULL DEFAULT '0000-00-00 00:00:00',
  `active` TINYINT NOT NULL DEFAULT 0,
  `moduleId` BIGINT NOT NULL DEFAULT 0,
  `refSetId` BIGINT NOT NULL DEFAULT 0,
  `referencedComponentId` BIGINT NOT NULL DEFAULT 0,
  `sctId1` BIGINT NOT NULL DEFAULT 0,
  PRIMARY KEY (`id`,`effectiveTime`),
  KEY `refset_c_id` (`refSetId`,`referencedComponentId`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
Figure 10. Create Component Refset Table
Tip: Some of the approaches to optimization suggested elsewhere in the guide result in changes to this example schema. You may wish to consider these before implementing this schema.
LOAD DATA LOCAL INFILE '[path]sct_cRefset_[AdditionalInfo].txt'
INTO TABLE `sct2_refset_c` 
LINES TERMINATED BY '\r\n' IGNORE 1 LINES
(@uid, `effectiveTime`, `active`, `moduleId`, `refSetId`, `referencedComponentId`, `sctId1`) 
SET id=UNHEX(REPLACE(@uid,'-',''));
Figure 11. Import Component Refset File

^ Previous topic: 6.3. Machine Readable Concept ModelNext topic: Importing SNOMED CT release data