Main Content

TBX Resources

The following files and resources support TBX/ISO 30042 and are provided at no charge as a service to implementers and users TBX. The integrated RNG Schema with embedded Schematron rules referred to in Annex F of the ISO version of TBX (ISO 30042) is available from this page below.

Integrated RNG schema

TBX-format terminology markup languages (TMLs) are defined by the core structure DTD and additional constraints in an XCS file. The format of the XCS file is defined by the XCS DTD. In some instances, it may be desirable to represent TBX TMLs as integrated schemas that represent both the core structure and the additional constraints contained in the XCS so allow processing with general-purpose XML tools.

As an example, LISA is providing an integrated schema in Relax NG format with embedded Schematron rules that can be used to validate TBX files against the core structure and default XCS.

This schema is available for download here.

TBX Checker

The TBX Checker is an open-source, cross-platform tool written in Java that validates TBX-format files for conformance to the core structure (TBX DTD) and adherence to the constrains of an external XCS file. It thus goes beyond a general-purpose XML validating parser to deliver TBX-specific functionality.

The current version, 1.1.0, can be downloaded here (1.3 MB) in a zip archive that contains the application, a sample XCS file, and some additional files that can be used to demonstrate the application’s error-checking procedures.

The TBX Checker is provided without warranty for any particular purpose as a public service.

Sample TBX files

  • TBX Sample TBX sample file extracted from ISO 30042 (DIS.2, February 2008) with all associated files
    Show/hide details.

    General information

    This file is extracted from section 10.1 of the TBX specification and includes the Core Structure DTD extracted from Annex A and the XCS file extracted from Annex D with a few error corrections, and the DTD for the XCS file extracted from Annex B. It is ready to use with the TBX Checker. The files contained therein are useful for testing because some of them contain deliberate errors.

    Download samples files as one ZIP file.

    File Descriptions

    • sampleTBXfile-badDatcat.tbx - TBX file that should return an error in data category selection
    • sampleTBXfile-badPicklistVal.tbx - TBX file that uses an invalid picklist value
    • sampleTBXfile-coreInvalid.tbx - TBX file that fails core structure validation
    • sampleTBXfile-notWellformed.tbx - TBX file that is malformed XML.
    • sampleTBXfile.tbx - This file is a valid TBX file that adheres to the constraints of the default XCS
    • TBXcoreStructV02.dtd - TBX core structure DTD
    • tbxxcsdtd.dtd - DTD for XCS files
    • TBXXCSV02mod.xcs - Default XCS file (identical to the default XCS in Annex E with the exception of some corrections to some contents elements)

The following sample files are provided to aid TBX implementers. Note that these files need to be updated to conform to changes made in the latest TBX version. They will be replaced soon.

  • IBM Data Sample data provided by IBM Corporation
    Show/hide details.

    General Information

    The IBM data was received as an SGM file in the TIF format. The conversion code is written in PERL and compiled into an executable. The conversion process accepts at least two command line arguments. The first needs to be the file where the output will be saved. The second needs to be the file that is being used as input. Any arguments after the second will be processed as more input files. If multiple input files are supplied as arguments they will all be converted to TBX and saved in the same output file. To make access to the conversion process easier a batch file has been created that runs the executable file and supplies two arguments. This batch file can be edited to change the output file, input file, or to add more input files.

    Download IBM files as one ZIP/GZIP file

    File Descriptions

    • IBM Data Mapping.doc - Document in Microsoft Word that shows the TBX equivalent to the IBM information.
    • ibm_v2.bat - Batch file that runs ibm.exe and inputs two arguments with the output file name first and the input file name second.
    • ibm_v2.exe - Execuatable file compiled from PERL source code (ibm.pl) that should run on any Microsoft windows system.
    • ibm_v2.pl - PERL source code that converts IBM
    • ibm_sample_tif.sgm - Sample data from IBM in TIF format.
    • ibm_tbx.xml - TBX output from the conversion process.
    • TBXcdv04.dtd - The schema that defines the structure of a TBX document.

    For Non-Windows Users

    If you run an operating system besides Microsoft Windows here are important notes:

    • You should be able to open the Microsoft Word documents in any word processor. They simply contain a heading and a table.
    • If you have PERL installed on your system you should be able to run the conversion with the file ibm_v2.pl. But you must have the XML::Writer module installed. Make sure that when you run the PERL file you give the command line arguments as mentioned in the description above.
  • Medtronic Data Sample data provided by Medtronic Corporation
    Show/hide details.

    General Information

    Three Medtronic terminology entries were quite easily converted to TBX because their original XML format is very similar to TBX. Medtronic supplied three terminological entries, each containing terms in 15 languages. This was a prime example of a truly multilingual data set.

    Download Medtronic files as one ZIP/GZIP file

    File Descriptions

    • MDT_*.xml - the three original Medtronic entries.
    • Medtronic Data Mapping.doc - Document in Microsoft Word that shows the TBX equivalent to the Medtronic data format
    • medtronic_tbx.xml - the Medtronic entries after conversion to TBX format.
    • medtronic_v3.bat - Batch file that runs medtronic_v3.exe with two arguments.
    • medtronic_v3.exe - Executable file compiled from PERL source code (medtronic_v3.pl) that should run on any Microsoft Windows based system.
    • medtronic_v3.pl - PERL source code that converts Medtronic data to TBX format.
    • TBXcdv04.dtd - The schema that defines the structure of a TBX document.

    For Non-Windows Users

    If you run an operating system besides Microsoft Windows here are important notes:

    • You should be able to open the Microsoft Word documents in any word processor. They simply contain a heading and a table.
    • If you have PERL installed on your system you should be able to run the conversion with the file medtronic_v3.pl. But you must have the XML::Writer module installed. Make sure that when you run the PERL file you give the command line arguments as mentioned in the description above.
  • Maryland Institute for Technologies in the Humanities (MITH) Sample data provided by MITH
    Show/hide details.

    General Information

    Patricia Kosco Cossard, a researcher at the University of Maryland Libraries and 2005-2006 Resident Fellow at MITH, is creating a Multilingual Thesaurus for Medieval Studies. She and Carl Stahmer, Acting ssociate Director of the MITH, have created a TBX record that can be displayed automatically in a browser, in a very appealing format, through the use of an XSL stylesheet. This very simple sample will be followed by more complex terminological records in the coming months. Many thanks to MITH for this interesting contribution.

    Download MITH files as one ZIP/GZIP file

    File Descriptions

    • template.xml - The TBX entry.
    • thesstyle.xsl - The XSL style sheet.
    • TBXcdv04.dtd - The schema that defines the structure of a TBX document.

    Instructions

    Download the files to a directory on your computer, and open the template.xml file in your browser. You will see the TBX entry formatted for the screen. Select "View Source" and you will see that the source is TBX.

  • Oracle Sample data provided by Oracle Corporation
    Show/hide details.

    General Information

    The Oracle data was received as a Microsoft Excel spreadsheet and then exported, via "Save As", to a tab delimited text file. The conversion code is written in PERL and compiled into an executable file. The conversion process accepts two command line arguments; 1) the tab delimited input file and 2) the TBX output file. To make access to the conversion process easier a batch file has been created that runs the executable file and supplies the two arguments. To run the conversion double-click the oracleTBX_v3.bat. You can change the command line arguments by editing the batch file.

    Download Oracle files as one ZIP/GZIP file

    File Descriptions

    • Oracle Data Mapping.doc - Document in Microsoft Word that shows the TBX equivalent to the Oracle information.
    • oracleTBX_v3.bat - Batch file that runs oracleTBX_v3.exe with two arguments.
    • oracleTBX_v3.exe - Executable file compiled from PERL source code (oracleTBX_v3.pl) that should run on any Microsoft Windows based system.
    • oracleTBX_v3.pl - PERL source code that converts Oracle data, in tab delimited file, to TBX format.
    • oracle_TBX.xml - TBX output from the conversion process.
    • oracle_terms.tsv - Tab delimited file exported from the original oracle_terms.xls.
    • oracle_terms.xls - Original sample data from Oracle that was exported to a tab delimited file.(oracle_terms.tsv)
    • TBXcdv04.dtd - The schema that defines the structure of a TBX document.

    For Non-Windows Users

    If you run an operating system besides Microsoft Windows here are important notes:

    • You should be able to open the Microsoft Word documents in any word processor. They simply contain a heading and a table.
    • If you have PERL installed on your system you should be able to run the conversion with the file oracleTBX_v3.pl. But you must have the XML::Writer module installed. Make sure that when you run the PERL file you give the command line arguments as mentioned in the description above.
  • SDL Sample data provided by SDL
    Show/hide details.

    General Information

    The SDL data was received in a proprietary database markup format. A fairly large file (223 concept entries) it presented challenges that can be expected in larger enterprise data sets where several employees have updated the content. The source format itself is well designed with respect to concept orientation, granularity, and data integrity. There were some challenges in the conversion relating to character encoding, handling non-unique reference IDs and XML tags found within element attribute values. During the conversion we also discovered some mixed data content due to errors in data entry, for example, having more than one term in a term element. This is a common error among employees, and having a rigorous exchange format helps to discover and correct such errors. Because this error only occurred a few times, we decided to fix the source file directly, rather than adjust the conversion routines. If the error occurred frequently, it may also be possible to correct the errors automatically during the conversion, thus upgrading the data.

    Download SDL files as one ZIP/GZIP file

    File Descriptions

    • sdlprintingglossary.ste - the original SDL glossary.
    • SDL data mapping.doc - Document in Microsoft Word that shows the TBX equivalent to the SDL data format.
    • sdl_tbx.xml - the SDL glossary after conversion to TBX format
    • sdl_v4.bat - Batch file that runs sdl_v4.exe with two arguments.
    • sdl_v4.exe - Executable file compiled from PERL source code (sdl_v4.pl) that should run on any Microsoft Windows based system.
    • sdl_v4.pl - PERL source code that converts SDL data, in proprietary database markup format file, to TBX format.
    • TBXcdv04.dtd - The schema that defines the structure of a TBX document.

    For Non-Windows Users

    If you run an operating system besides Microsoft Windows here are important notes:

    • You should be able to open the Microsoft Word documents in any word processor. They simply contain a heading and a table.
    • If you have PERL installed on your system you should be able to run the conversion with the file sdl_v4.pl. But you must have the XML::Writer module installed. Make sure that when you run the PERL file you give the command line arguments as mentioned in the description above.