Bioinformatics is an emerging discipline of science and technology. The National Center for Biotechnology Information (NCBI, 2001) defines Bioinformatics as “a field of science in which Biology, Computer Science, and Information Technology merge into a single discipline”. Within the framework of Bioinformatics, there are three sub-disciplines: First one deal with the development of new algorithms and statistics to help assess relationships among members of large data sets. The second one deals with the analysis and interpretation of various types of data viz. nucleotide (DNA/RNA) and amino acid sequences, protein domains, and protein structures constituting biological databases and the third sub-discipline deal with the development and implementation of tools that enable efficient access and management of different types of information.
The origin of this discipline in a crude form can be traced to the time when father of genetics, Gregor John Mendel selected certain traits such as height, flower color, seed shape and color in pea plants and started experimentation with these traits, along with maintaining records of the experimental data (a primitive biological database). He then converted the phenomenon observed into mathematical relations and tried to understand how these traits move over generations after generations and came out with founding principles of classical genetics. After Mendel, biological record keeping and understanding biological phenomenon by mathematical relations have come a long way.
The recent history of biological databases starts with the first protein sequences reported by Frederick Sanger for bovine insulin in 1956, consisting of 51 residues. For this pioneering work, he was awarded his first Nobel Prize in Chemistry in 1958. Later on, the first nucleic acid sequence of yeast alanine tRNA with 77 bases was reported by a group led by Robert Holley from Cornell University in 1965. In 1977, Sanger and colleagues introduced the “dideoxy” chain-termination method or Sanger method for sequencing DNA molecules that were eventually used to sequence the entire human genome as well as many other model systems in biology.
Margaret Dayhoff, a pioneer in the application of mathematics and computational methods to the field of biochemistry gathered all the available sequence data to create the first biological protein database and published a book “Atlas of Protein Sequence and Structure”. She developed many of the tools used today in database design and utilization. In 1980, Dr. Dayhoff developed an on-line database system that could be accessed by telephone line, the first sequence database available for interrogation by remote computers.
The rapid advancement in genetics and molecular biology tools and technology led to an era of “-omics” such as Genomics, Proteomics, Transcriptomics, Metabolomics etc. These, in turn, have generated a massive amount of data from the high throughput molecular biology approaches with which we deal with the genome, proteome etc. This huge amount of data needed to be made available as well as accessible and this was a challenge in itself. This challenge was successfully tackled by using information technology resources also knowns as Information and Communication Technology (ICT). It gave a flexible, smart and rapid way of storing, managing, querying and retrieving large and complex biological data (viz. sequences and structures). It has also helped in handling, managing, and maintenance of data generated from various biological systems. For this purpose of maintaining growing biological data in digital form, biological databases were created that initially acted as a storehouse of biological data with limited user’s access such as student, researchers, and the pharmaceutical industry. Later on, the arrival of internet and improvement in technologies, the biological database incorporated many other types of databases related to biological material such as gene expression database, metabolic pathway database, disease database etc. Since then, biological database accessibility, sharing and spread increased across the globe. Today, biological databases such as NCBI, EMBL, and DDBJ are governed under the aegis of the International Nucleotide Sequence Database Collaboration (INSDC) that makes policies for access, use, and advisory to submitters and also to the user communities.
Subsequently, human genes (579) got mapped by in-situ hybridization in 1981. In 1988, The Human Genome Organization (HUGO), an international organization of scientists was founded. The first complete genome map was published for the bacterium Haemophilus influenzae in 1995. Since then, many prokaryotic and eukaryotic genomes have been sequenced such as Mycoplasma genitalium (1995), Escherichia coli in 1997, Caenorhabditis elegans and Brewer’s Yeast (Saccharomyces cerevisiae) in 1998, Arabidopsis thaliana (2000) and human genome in 2001. These projects along with independent work generated huge sequences and structural databases that evolved many database institutions such as NCBI, EMBL, DDBJ, and PIR.
As stated earlier, the rapid evolution in biological data generation has brought technical challenges. Some of the challenges are the huge investment of tax payer’s money, a skilled human resource to develop a biological database for storage and management, software/tools to analyze the biological data and development of advanced tools and technologies to meet the future needs and challenges. Such creations fall under the domain of intellectual property. Therefore, it requires protection from misappropriation and piracy. Protections to software and databases can be ensured by applying appropriate national intellectual property laws.
Intellectual Property Rights (IPRs) are legal rights that protect creations and/or inventions resulting from intellectual activity in the industrial, scientific, literary or artistic fields. The most common IPRs include patents, copyrights, trademarks and trade secrets. Such protection will not only protect the investment but also help in motivating the investor along with the creator of new software and databases. Ownership of the intellectual property in a biological database and the associated rights will provide a significant effect on those who are maintaining databases. Some of the important intellectual property rights given or associated with biological or bioinformatics databases are described here:
1. Copyright: Generally considered to be the exclusive legal right granted by national law to the author of a work to disclose it as his own creation, to reproduce it and to distribute or communicate it to the public in any manner or by any means, and also to authorize others to use the work in specified ways. Most copyright laws distinguish between economic and moral rights, which together constitute copyright. There are usually certain limitations made by the law as to the kind of works eligible for protection and as to the exercise of the rights of authors comprised in the copyright. For example, Copyright is given to the database and software in India.
2. Patent: A patent is an exclusive right granted for an invention, which is a product or a process that provides, in general, a new way of doing something, or offers a new technical solution to a problem. To get a patent, technical information about the invention must be disclosed to the public in a patent application. For example, patent granted to the bioinformatics software in the USA.
3. Trademark: A trademark is a sign capable of distinguishing the goods or services of one enterprise from those of other enterprises. Trademarks are protected by intellectual property rights. For example GenBank®; BLAST is a registered trademark of the National Library of Medicine.
4. Trade Secret: Any confidential business information which provides an enterprise a competitive edge may be considered a trade secret. Trade secrets encompass manufacturing or industrial secrets and commercial secrets. For example, algorithms of bioinformatics software. The unauthorized use of such information by persons other than the holder is regarded as an unfair practice and a violation of the trade secret. (Source: WIPO)
Biological Databases, Types & Intellectual Property (IP) Protection
A biological database is a large, organized, stable entity associated with information technology and software to store, update, query, and retrieve components of the data stored within the system. Biological databases have been classified as Primary, Secondary, Composite and Integrated databases.
Primary databases are those biological databases which contain the raw sequences of nucleic acid (DNA and RNA), protein sequences and biochemical reactions. For example, the National Center for Biotechnology (NCBI), European Molecular Biology Laboratory (EMBL) and DNA Data Bank of Japan (DDBJ) and Protein Data Bank (PDB). Primary databases play a great role in bioinformatics research. They are updated regularly and contain a massive amount of experimentally obtained data. These databases are mostly maintained by public funding. Many of the primary databases are freely accessible to anyone and so are based on open access. The reason behind keeping them out of IP protections are, first, mostly the data generated are the outcome of work carried out by government-funded or socially funded research bodies and thus are made available free to the public for their use and secondly, it will not stifle the further development in the field of biology.
Secondary databases are derived biological database from the information available in primary databases. These databases are well analyzed, upgraded and annotated version of primary databases of the nucleic acid and protein sequences. For example, Protein databases like Swiss-Prot, CATH, KEGG, OMIM, SCOP, and PROSITE. Such up-gradation makes the secondary database more useful to understand the structure and function of biomolecules. The improvement and upgradation need an input of labor, skill, technology, and capital that usually come from individuals or enterprises so some of the secondary databases are not freely available in public domain and thus need or given protection under IP laws if it satisfies the criteria of originality and creativity. Protection ensures the investors return and motivation alive to keep working in this area.
Composite databases are those biological databases which contain information from a variety of primary databases. For example, NCBI. These databases fall under the norms of primary databases.
Integrated databases are biological databases containing biological data obtained from different related organisms. Integrated data help in comparative analysis or studies and provide a great understanding of the evolutionary relationship and synteny between the genomes of different organisms. For example, ATIDB (Arabidopsis thaliana Integrated Database). It provides a database derived from genome and transcriptome sequences between the model organisms, Arabidopsis and related Brassica species and helps in comparative studies.
At present, a common standard IP protection across globe is followed under the TRIPS agreement accepted by parties to the World Trade Organization (WTO) except EU where the European Commission (EC) itself had issued the database directive and established its own sui generis regime for biological databases protection.
TRIPS contain the provisions for the protection of databases as well as computer software. Part II of the TRIPS Agreement provides protection to the computer programs and compilations of data under copyright. Article 10.1 of the TRIPS Agreement clarifies that computer programs are open to copyright protection as works of literature and that this is irrespective of the manner in which they are presented. Correspondingly, Article 10.2 provides protection for data compilations which does not depend on whether the individual elements of the data compilation are open to copyright protection in their own right. After the TRIPS Agreement entered into force, World Intellectual Property Organization (WIPO) explicitly regulated the protection of computer programs and data compilations in an auxiliary agreement to the Berne Convention (Arts 4, 5 WIPO Copyright Treaty).
Worldwide the different protection strategies exist for the biological database. Here, comparative analysis of the strategies used in the USA, EU, and India are discussed below.
(A) Biological Database Protection in the USA
The USA gives copyright protection to the biological databases. Before 1991, copyright laws were used to protect databases in the USA but after that, US courts added the concept of “originality” or “creativity” factor for getting the copyright that led to exclusion from copyright protection because they did not meet the original criteria. US supreme courts interpreting the position on the copyright protection accorded to compilations held that facts are not copyrightable but the compilation of facts is, provided there is a sufficient degree of originality in the compilation in terms of indices employed etc., (Fiest Publication, Inc. v. Telephone Services Company, Inc. (1991). Thus, the biological databases are accepted as the compilation of facts and copyright protections are granted.
(B) Biological Database Protection in the European Union (EU)
European Union passed the EU database directive in July 1995 that suggested two-tier protections for legal protection of databases (biological database):
1. Copyright system.
2. Sui-generis or quasi-copyright System.
The EU directive on copyright protection of database provides for the protection of the content of the database coupled with protection for the database if there is originality in the selection of arrangement of material. Such protection is based on the justification that a person who has made a substantial investment in obtaining, verifying or presenting the database must have exclusive right over it.
The Sui generis (Latin, of its own kind) system could be used to protect the database maker investment on some special but non-original databases which involved huge capital, human resource, and material resource. It was meant for the first time to protect database by a special right, later a balance was made by adopting dual mechanism. The copyright protection was available to countries and parties who were the member of Berne convention or TRIPS agreement, while sui generis was available only to makers who are nationals of EU member states.
(C) Biological Database Protection in India
The Copyright Act, 1957 of India grants protection to the original expression. Computer Programs have been considered as a “literary work” under the copyright act 1957. Section 2 (o) of the act defines ‘literary work’ and includes computer programs, tables, and compilations including computer databases (which includes biological databases also). This provision was inserted in 1999 and come into force in the year 2000.
In Burlington Home Shopping Private Limited v. Rajnish Chibber & Another (1995 PTC 278) a matter came before Delhi High Court to decide the protection of database and court held that a compilation of addresses developed by anyone by devoting time, money, labour and skill amount to a “literary work”, though the sources might be commonly situated. Similarly in Bharat Matrimony Com. P., Ltd., Chennai v. People Interactive (I) Pvt. Ltd., Chennai, 2009 AIHC (NOC) 433(Mad) was held that “literary work” includes a computer program and compilation including computer databases. Though cases on biological database protection are absent in India, similar protection can be obtained for the compilation of biological databases (Bioinformatics database) derived from sequencing of nucleic acid and proteins.
Section 13 of the copyright act 1957, provides the categories of work in which the copyright subsists which includes original literary work. The author of a work is the first owner of the copyright in the work. However, in case of employer-employee, if a work is made in course of employment under a contract of service or apprenticeship, the employer shall be the first owner of the copyright in the above of any contract to the contrary. The computer software is granted protection as a copyright in India unless it leads to a technical effect and is not a computer program per se. The Copyright Act protects the author’s economic and moral rights in the copyrighted work as stated in section 14 and 57 respectively. Even though the TRIPS Agreement does not specifically protect the moral rights, but the same is protected under the Copyright Act, 1957.
Section 51 of the Copyright Act, 1957 of India defines infringement of copyright and states that a person infringes the copyright of another if he/she although not authorized to do so, commits any act which only the copyright holder has exclusive rights to do. Civil remedies to copyright infringements are provided in chapter XII of Copyright Act, 1957, granting injunction and damages for copyright infringement and criminal liability provisions are provided in chapter XII of the Act, 1957 wherein abetment of infringement is also unlawful and punishable with imprisonment of up to three years and a fine up to Rs. 2 Lacs.
The difference between USA and EU strategy of biological database protection is that EU gives much emphasis over the economic aspect for database creation while the USA gives primacy over “originality and creativity” in the material. On the other hand, similarity lies in the fact that both provide copyright protection to the databases. Indian copyright law has tried to strike a balance by ensuring originality along with the economic aspect of the author.
(a) Patent and Biological Databases Protection
The biological databases are compilations of biological sequences and other data types, and if biological sequences are unpatentable, then biological databases are also unpatentable. To make it patentable, they have to be related to a statutory subject matter. On the other hand, the databases are not themselves patentable, patent protection may be given for the database– related inventions. Patent protection in India is given to those inventions that satisfy the criteria of novelty, inventiveness and industrial use. The biological database is not granted patent protection because they don’t fulfill the above-mentioned criteria.
Patent protection for DNA, RNA and Protein Sequence extends only to biological and physical compositions and not to the abstract biological sequence information that describes the composition. Therefore, a patentee could only prevent from using the composition itself and not the information within the molecule. It was held by the Supreme Court of the United States in Diamond vs. Diehr (1981), that to qualify as the patentable subject matter the biological sequence has to be categorized as a process, machine or apparatus. An idea itself is not patentable and neither the principle in the abstract.
In State Street Bank & Trust Co. v. Signature Financial Group, Inc. case (1998), it was held by US Court of Appeals for Federal Circuit that even if information per se is not patentable as a tangible product, a process of producing the information may be patentable. Secondly, patent protection would extend only to the process for creating the database and not the database itself. It would limit the value of the patent because a competitor wanting to infringe the patented product can simply make the product in a non-infringing way.
(b) Trademark/Trade Secret Laws and Biological Database Protection
There is no trade secret or trademark given for primary databases. For secondary databases, protection from trade secret law is available. It is because of the secondary database creator studies and analyzes the primary data and adds many important features that need huge capital and skilled human resource. Thus, he/she may not make all of it public or may charge money for accessing the database. However, it should be noted that trademark law will only protect the content of the database.
The database developer can get protection from the copyright law and can prevent the third parties, from copying, accessing contents, by using contract law. For the non–original database, the contract law is the only mode of protection where database developer can prevent the breach of faith and infringement. Presently, two types of contracts are available for the protection of databases namely Shrink-wrap and Click-wrap/Web-wrap/Browse-wrap contracts. Shrink-wrap license is used for the databases in compact discs (CDs) and the license is put down in writing during the packaging. Once user uses the products, he/she agrees to all the terms and conditions of the product. Click-wrap/Web-wrap/Browse-wrap licenses are meant for the internet users. When the buyers want to access the content in the database, they should enter “agree” online which means that they have agreed to the contract.
Biological databases play a great role in the present era of globalization. These databases are linked with information and communication technology across the globe and can be accessed, shared and used for development and promotion of science. In this regard, appropriate protection to database and knowledge to persons associated with this domain is very pertinent. In the next issue under this section, other aspects of Bioinformatics and IPR will be taken up and discussed in details.
A comprehensive list of references and supporting documents are available with the Author upon request. For further details, kindly mail us at [email protected]