![]() ![]()
|
Page reference: www.snomed.org/tig?t=tsg2_tsg |
SNOMED CT Release Format 2 is designed to enable the distribution and use of a full historical view of SNOMED CT from its first release in 2002 up to its most recent release. This allows terminology servers to provide a range of different views of SNOMED CT. However, it does not require that all terminology servers support the full range of views.
Table 1 identifies three options for the views that a SNOMED CT terminology server may support. The simplest of these is the single snapshot view which provides access to a single release version. This closely matches the view provided by the original SNOMED CT release format ( RF1). The most powerful full view which allows the server to provide access to any selected version of SNOMED CT from a single representation of the SNOMED CT resource. This makes full use of the version features in RF2. Alternatively a server may provide a selected set of snapshots representing versions of known interest to its users.
People designing a terminology server need to decide whether their server will only provide access to a single current view of the SNOMED CT resource or will also support retrospective views of earlier versions of the terminology. The single snapshot view is simplest to implement and matches the service most vendors offered with original SNOMED CT release format ( RF1). A more complete view is now possible using Release format 2 and this offers several significant advantages. It supports incremental updates allowing smoother transition as new versions become available. It also allows changes between versions to be detected more easily and can be used to evaluate queries against an earlier version for comparative purposes.
People choosing a terminology server need to consider whether a server that only supports a single snapshot view of the current version meets their requirements. If they require access to previous versions a server that supports the full view is likely to be the best long term solution. A server that allows access to multiple discrete snapshots may provide a reasonable interim solution but may be less flexible and less easy to maintain.
|
View |
|
|---|---|
|
A snapshot view terminology service provides access to the content of the current state of all the components of the International Release and any chosen Extension Releases. |
|
|
Multi- snapshot view |
A "multi- snapshot view " terminology service provides access to:
A "multi- snapshot view " terminology server may provide access to delta views that report the differences between two snapshot views. This is limited to comparisons of specific points represented by the available snapshot views. |
|
A Full view
terminology service provides access to:
A full view terminology server should also provide access to views that show the changes to components between any two specified points in time. |
The full view is required to support some SNOMED CT use cases but many requirements can be adequately met by providing access to a current Snapshot view. The multi- snapshot view is an approach that may meet some requirements that are not met by a single snapshot without requiring support for the Full view.
People designing a terminology server need to decide how they will store and access the SNOMED CT resources. This decision depends on a variety of factors including: types of Terminology services required, the technical environment in which development is undertaken and the experience of the developers.
People choosing a terminology server need to know whether the server will meet their requirements and whether it works effectively in their preferred technical environment. They will also wish to be sure it delivers the required functionality and performance. While they may not be directly interested in technical approach to representation of SNOMED CT resources, these design decisions are likely to affect the ability of a server to meet their requirements.
The following sub-sections briefly outline some of the technical options.
The distributed release files can be imported directly into a database schema that matches the distribution file specification. This data then provides the core resource at the heart of a terminology server.
To search for a term matching a supplied pattern in a concept that represents a type of procedure also requires multiple joins to link the Descriptions with matching terms to the relevant concept and test whether it is a subtype of the 71388002 | Procedure (procedure) | concept.
The performance criteria of searches and joins in very large relational databases vary significantly. Therefore, different optimizations may need to be used to achieve acceptable response times according to the nature of the relational database system.
An additional consideration for RF2 implementations is the way in which alternative views are supported since, without optimization these may have a significant impact on performance.
There is no requirement to use the data structure as distributed. Other structures can be used provided that they are able to deliver the range of Terminology services required. Options include:
Although the primary distribution format is relational, this does not require terminology servers to utilize a relational database as the primary or only storage format. The requirements for terminology services may also be met by representing some or all of the distributed data in other forms including object-oriented databases, Extensible Mark-up Language (XML) and/or proprietary data structures. These structures may be used separately or, in some cases, in combination with a relational database.
This section outlines an example of a relational approach to representation of a full view of the SNOMED CT Resources. The example has been developed and tested using the Open Source database MySql Community Edition.
The following table provides example mapping from the SNOMED CT RF2 datatypes to appropriate datatypes supported by MySql.
|
RF2 Datatype |
MySql Datatype |
Comment on Mapping |
|---|---|---|
|
BIGINT |
Both these datatypes represent 64-bit integers. |
|
|
BINARY(16) |
MySql does not have a native datatype for UUID. The BINARY(16) representation is most economical for storage and most efficient for indexing. This requires a transformation on storage or review. The example queries in this guide use the simple transformations functions shown in Table 3. An alternative is to use CHAR or VARCHAR representations. This does not require the transformations noted above. However, use of VARCHAR (36) costs 38 bytes rather than 16 bytes per UUID and due to use of UTF8 using CHAR (36) consumes a fixed 108 bytes per UID in a MySql table. More importantly the index performance is poorer for these string representations. |
|
|
INT |
Both these datatypes represent 32-bit integers. |
|
|
VARCHAR (Len) |
VARCHAR is used in preference to CHAR as it provides more space efficient storage. Note that in the UTF8 encoded tables required for the MyISAM database reserves three bytes per character for fixed length strings. In contrast VARCHAR uses the number of bytes actually plus one or two bytes to specify length. Use of VARCHAR does result in some loss of performance but strings are only used in Descriptions, string refsets and Identifier tables. In all these cases strings with a significant range of lengths are used and the space penalty for using CHAR datatypes would be high. |
|
|
TINYINT |
MySql treats the datatype name boolean as an alias for TINYINT. In the examples this mapping is made explicit. |
|
|
DATETIME |
This is the full representation of date and time and is used to ensure compatibility with existing data and potential accommodation of time stamped data. The more compact DATE type could be used with current data as the effectiveTime is currently a date only representation. However, the more flexible DATETIME has been preferred in the examples because this emphasizes the fact that in an International environment the effectiveTime implies the UTC time and thus the date alone is not a precise representation. |
|
Action |
|
|---|---|
|
Load or insert to storage |
SET [column-name] = UNHEX(REPLACE(@uid,'-','')) |
|
Select from storage |
RenderUid([column-name]) |
|
UNHEX |
A built in MySql function that converts a hexadecimal string to binary. |
|
RenderUid |
FUNCTION `RenderUid`(Uid blob) RETURNS varchar(36) CHARSET utf8
BEGIN
Set @Tmp = Hex(uid);
RETURN CONCAT(SUBSTRING(@Tmp,1,8),'-',SUBSTRING(@Tmp,9,4),'-',
SUBSTRING(@Tmp,13,4),'-',SUBSTRING(@Tmp,17,4),'-',SUBSTRING(@Tmp,21));
END
|
CREATE TABLE `sct2_concept` ( `id` BIGINT NOT NULL DEFAULT 0, `effectiveTime` DATETIME NOT NULL DEFAULT '0000-00-00 00:00:00', `active` TINYINT NOT NULL DEFAULT 0, `moduleId` BIGINT NOT NULL DEFAULT 0, `definitionStatusId` BIGINT NOT NULL DEFAULT 0, PRIMARY KEY (`id`,`effectiveTime`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8;Figure 1. Create Concept Table
LOAD DATA LOCAL INFILE '[path]sct2_concept_[AdditionalInfo].txt' INTO TABLE `sct2_concept` LINES TERMINATED BY '\r\n' IGNORE 1 LINES;Figure 2. Import Concept file
CREATE TABLE `sct2_description` ( `id` BIGINT NOT NULL DEFAULT 0, `effectiveTime` DATETIME NOT NULL DEFAULT '0000-00-00 00:00:00', `active` TINYINT NOT NULL DEFAULT 0, `moduleId` BIGINT NOT NULL DEFAULT 0, `conceptId` BIGINT NOT NULL DEFAULT 0, `languageCode` VARCHAR(3) NOT NULL DEFAULT '', `typeId` BIGINT NOT NULL DEFAULT 0, `Term` VARCHAR(255) NOT NULL DEFAULT '', `caseSignificanceId` BIGINT NOT NULL DEFAULT 0, PRIMARY KEY (`id`,`effectiveTime`), KEY `sct2_description_concept` (`conceptId`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8Figure 3. Create Description Table
LOAD DATA LOCAL INFILE '[path]sct2_description_[AdditionalInfo].txt' INTO TABLE sct2_description LINES TERMINATED BY '\r\n' IGNORE 1 LINES;Figure 4. Import Description file
CREATE INDEX ix_sct2_description_3 ON sct2_description([ConceptId],[typeId],[languageCode])Figure 5. Index Description Table - Concept
CREATE TABLE `sct2_relationship` ( `id` BIGINT NOT NULL DEFAULT 0, `effectiveTime` DATETIME NOT NULL DEFAULT '0000-00-00 00:00:00', `active` TINYINT NOT NULL DEFAULT 0, `moduleId` BIGINT NOT NULL DEFAULT 0, `sourceId` BIGINT NOT NULL DEFAULT 0, `destinationId` BIGINT NOT NULL DEFAULT 0, `relationshipGroup` INT NOT NULL DEFAULT 0, `typeId` BIGINT NOT NULL DEFAULT 0, `characteristicTypeId` BIGINT NOT NULL DEFAULT 0, `modifierId` BIGINT NOT NULL DEFAULT 0, PRIMARY KEY (`id`,`effectiveTime`), KEY `sct2_relationship_source` (`sourceId`,`characteristicTypeId`,`typeId`,`destinationId`), KEY `sct2_relationship_dest` (`destinationId`,`characteristicTypeId`,`typeId`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8Figure 6. Create relationships table
LOAD DATA LOCAL INFILE '[path]sct2_relationship_[AdditionalInfo].txt' INTO TABLE sct2_relationship LINES TERMINATED BY '\r\n' IGNORE 1 LINES;Figure 7. Import Relationship file
CREATE TABLE `sct2_identifier` ( `identifierSchemeId` BIGINT NOT NULL DEFAULT 0, `alternateIdentifier` VARCHAR(255) NOT NULL DEFAULT '', `effectiveTime` DATETIME NOT NULL DEFAULT '0000-00-00 00:00:00', `active` TINYINT NOT NULL DEFAULT 0, `moduleId` BIGINT NOT NULL DEFAULT 0, `referencedComponentId` BIGINT NOT NULL DEFAULT 0, PRIMARY KEY (`identifierSchemeId`,`alternateIdentifier`,`effectiveTime`), KEY `sct2_relationship_sctid` (`referencedComponentId`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8Figure 8. Create Identifier Table
LOAD DATA LOCAL INFILE '[path]sct2_identifier_[AdditionalInfo].txt' INTO TABLE sct2_identifier LINES TERMINATED BY '\r\n' IGNORE 1 LINES;Figure 9. Index Identifier Table - Primary
CREATE TABLE `sct2_refset_c` ( `id` binary(16) NOT NULL DEFAULT '\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0', `effectiveTime` DATETIME NOT NULL DEFAULT '0000-00-00 00:00:00', `active` TINYINT NOT NULL DEFAULT 0, `moduleId` BIGINT NOT NULL DEFAULT 0, `refSetId` BIGINT NOT NULL DEFAULT 0, `referencedComponentId` BIGINT NOT NULL DEFAULT 0, `sctId1` BIGINT NOT NULL DEFAULT 0, PRIMARY KEY (`id`,`effectiveTime`), KEY `refset_c_id` (`refSetId`,`referencedComponentId`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8;Figure 10. Create Component Refset Table
LOAD DATA LOCAL INFILE '[path]sct_cRefset_[AdditionalInfo].txt' INTO TABLE `sct2_refset_c` LINES TERMINATED BY '\r\n' IGNORE 1 LINES (@uid, `effectiveTime`, `active`, `moduleId`, `refSetId`, `referencedComponentId`, `sctId1`) SET id=UNHEX(REPLACE(@uid,'-',''));Figure 11. Import Component Refset File
Sub-Topics
Related Information
![]() ![]() |
July 2012 International Release
Copyright 2002-2012 International Health Terminology Standards Development Organisation (www.ihtsdo.org)