The EDRM in Asia
The Electronic Discovery Reference Model (EDRM), developed in 2005 provides the framework around which all eDiscovery vendors position themselves, and, like most US-originating concepts, it almost fits Asia. 80% of the time, the model and the data will behave as expected, but 20% of the time subtle, yet significant differences will create exceptions that can undermine the defensibility of the process, cause delays or try the patience of the managing attorney. Management of these exceptions for Asian-sourced data is most frequently required during the collection, processing, analysis and production phases of the EDRM.
Efficient and defensible eDiscovery processes for multi-byte languages cannot be achieved simply through process knowledge and software skills. Technicians collecting and processing the data must know how to identify and handle various multi-byte codepages, unique programs, linguistic quirks, and many other details.
The EDRM model on this page and the descriptions below highlight some of the differences and outline a few details to note when conducting eDiscovery in Japan, China, Korea or other Asian countries. Please note that the list is not comprehensive and merely illustrates a few examples to highlight issues to consider beyond the typical North American experience.
Every modern company practices some form of electronic information management from sophisticated systems developed by IBM or Symantic, to simple email programs and file servers.
- Information Management Development Lag- Asian information management, particularly in preparation for litigation, has not developed as thoroughly and formally as in the USA. Often, data retention policies are simply email gigabyte limitations. In addition, corporations often do not have a unified policy among divisions.
- Microsoft is not as dominant- Microsoft’s lag to handle Asian languages opened the door for many smaller software companies to introduce products that could accommodate those languages. While the Microsoft Office suite has become a standard for most corporations, many unusual programs (Becky!, Ichitaro, etc.) are encountered in legacy data.
Once litigation is known or even suspected, an entity becomes obligated to identify and preserve data that it may be required to disclose. In this stage attorneys review the IT infrastructure and an organization’s data backup procedures to accurately identify the custodians and the ESI (and paper data) repositories to target for collections. At this stage it is common to determine the budget for the case; determine how many custodians will be collected; and to analyze the data. For Asian cases, some data will require conversion to be used for the case which is a non-trivial matter from both a technological and a budgetary standpoint. The type of data to be included and to be excluded available for the meet and confer conference.
- The ‘Ringi’ system: All companies have budget concerns, but the Japanese corporate culture adds another dimension of complexity. In Japan, the ‘ringi’ system requires budgetary exceptions to be approved at multiple levels with each layer requiring its’ own set of questions to be answered by the bidding vendors.
- Japanese clients often uses encryption and frequently password protects files. Cracking passwords and recovering deleted files from encrypted drives incur great expense so they are often negotiated out of the discovery requirements(government investigations may require these files).
- Chinese clients often use encrypted email systems and often have documents that may be considered to be State Secrets. Custodians must be interviewed to determine if a State Secret review needs to be done before attorneys may view the documents.
Attorneys work with the client to develop a preservation policy, typically called legal holds, to protect data from tampering and deletion. All employees relevant to the investigation must be contacted or have their data preserved immediately. The nature of the case (criminal, IP, government investigation, etc.) will determine the extent of the preservation requirements.
Failure to disclose required data can lead to discovery sanctions. However, disclosing aclient’s confidential information unnecessarily can lead to unwelcome consequence for the attorneys and their client’s business.
- Privacy and business confidentiality concerns are heightened in Asia. Although incomplete submission may lead to sanctions in the USA, Japanese companies experience cultural shock because, in Japan, discovery is optional. In China, data often will be considered a state secret and the client and attorneys must manage the conflicts between the international jurisdictions.
- Clients inexperienced with litigation will require additional explanation and clarification. Although custodians may understand they should not delete relevant data, sometimes that understanding is not enough! As an example, IT departments with tape back-up policies may need to modify those policies if they involve recycling a limited number of tapes. In another example, custodians under government investigation have to understand that deleting SPAM emails may generate expenses to prove that ONLY junk emails were deleted.
- Japanese companiese often keep PCs within the local office as employees are transferred. This may mean a custodian's PC may contain data from several other people in addition to their own. This may also mean that email archives may be located on multiple servers in multiple jurisdictions.
- In addition to the current programs used, an attorney with an extended date range for a matter will also need to be concened about legacy data systems. It is not uncommon for a custodian to have email from 2-3 different email programs within their data.
After identification and preservation of the data, the data must be collected for use in the litigation from the hard disk drives, CD-ROM, cellular phones, and email and network servers. For some cases targeted data acquisition can reduce the data size for the case, but full drive images are usually required by cases where deleted files are relevant and required (such as antitrust, FCPA, etc.). As a practical matter full-drive images are often collected so that the client's custodians will not need to be bothered if the scope of the case changes. Data may be extracted from a full-drive collection in a targeted manner to achieve similar results.
- Many standard tools can encounter exceptions when collecting data types in Asia. For example, the .jtd and .jaw file types (Ichitaro word-processing documents) may not be recognized by an acquisition tool and the file would be skipped by the tool and may not be included in an exception report. To overcome this problem, Soliton recommends being prepared to use more than one collection tool and checking both the file size and file count before and after collection.
- Many clients in Asia prefer to attempt self-collection to save money and will need extra education about the risks or the proper methods to self-collect effectively. For government investigations, often a 3rd party will be required to perform the collection to eliminate any question of data manipulation.
In China and Japan, many different code pages other than Unicode and many non-Microsoft programs will be encountered. US-based firms prefer familiar, best-of-breed tools with modern features for review that may not be compatible with this type of data.. Ji2 bridges the gap through data conversions and other work-around processes to make the data available all the while documenting the processes for defensibility. Although not an official stage of the EDRM, this requirement is very common due to the following causes:
- Unicode is only one of the multi-byte code pages used in documents and e-mails in Japan and China. Soliton’s extensive technical knowledge anticipates most processing errors and prevents garbled text from affecting review or search. Beware: it is common for less experienced techs to label data as corrupted when the characters and files are unreadable. Many times this is simply a code-page conflict they cannot resolve and this leaves the document unsearchable and unreviewable for the case.
- Unique file types - A few software programs may dominate the USA, but Japan still uses many different file types and e-mail systems. Soliton rapidly identifies and processes obscure Japanese e-mail systems and file types to reduce delays in processing data correctly.
Processing culls unrelated data and makes the information available for document review. Standard deduplication can be enhanced by judicious pre-culling of data types known to be irrelevant to the case such as system files (typically through removing files on the NIST list), and (if the nature of the case allows) program files, CAD files, audio files or video files. A listof file types are provided by Soliton to the managing attorneys prior to processing to allow an opportunity to reduce fees for the client.
- Your processing tool matters- Processing ‘double-byte’ data often generates errors or ‘garbling’ (see pre-processing) but the selection of a robust processing tool can minimize issues and data conversions. Soliton has extensive experience using different platforms and recommends those that have been thoroughly tested. However, even on the most robust tools, garbled data will be encountered, and must be resolved by an iterative process of reprocessing, testing and QC.
- Processing location matters- Data collected from overseas (including Europe, etc.) introduces a wealth of other concerns regarding privacy, security, jurisdiction, etc. Selecting a solution that processes the data in the local area minimizes many of these concerns and helps make the end client more comfortable.
- Paper difficulties- In the western world, scanning and applying optical character recognition (OCR) can be taken for granted because of its reliability. However, language detection and the fine strokes in each kanji character degrade overall OCR accuracy and can limit its effectiveness. Soliton understands the limitations and has developed work flows to ensure results.
Analysis can provide the key role in determining the data set for review, although the use of technology assisted review (aka: predictive coding, etc.) can remove some of the heavy lifting for this stage of the process. Typical roles within this stage are search and sampling. Search often uses search terms agreed between adverse parties during the meet and confer sessions, but can also include internally generated searches to explore the data by concept, custodian or data type. Sampling also explores the data, but instead of using a search category, sampling pulls documents from the document pool in some fashion, either arbitrarily or intentionally (statistical sampling for example).
Predictably, linguistics causes issues for Analysis in many different ways:
- Search Terms = more than translation- Literal translation of keywords into an Asian language can be dangerous and ineffective. Different regions use different words (even within a similar dialect) and slang, internal jargon, spelling variants and idioms can render a strict translation useless.
- Normalization- beyond the linguistic issues, some characters (katakana in Japanese, for example) can come in both full-width and half-width representations that look similar on a computer screen, yet they process different hexadecimal codes for the computer. Some tools normalize, or search both versions, but others don’t so testing can be critical to determine if additional search terms need to be developed.
- Garbled display = No Effective Search- Although display means a human cannot see the data, a more significant issue is that garbled data does not index and cannot be searched. To avoid false negatives, these documents should be located and corrected.
Document review often provides the foundation of evidence for both parties in the case, but also generates the bulk of the expenses. With ever-increasing data sizes, the number of documents attorneys must process also grows as do the hours of billing. Soliton's best-in-class indexing and review platforms provide options for reviewers to logically organize and search for data to reduce review times. Once the data has been indexed and processed Soliton offers platforms that offer non-linear review and review accelerating features such as e-mail threading and near duplicate document search. For more details see our use cases.
- Predictive coding - Soliton has run tests on several predictive coding tools and determined that while the technology can provide a reasonable substitution for a first round (Relevant Yes/No) review, Asian documents will continue to require significant QC during the process. Since Japanese, Simplified Chinese and Traditional Chinese share characters, any case with a mix of documents from each language will require additional QC to compensate for erroneous language detection. Predictive coding tools that rely upon language context may be less reliable than the more basic character/phrase tools due to the multiple uses for the same characters- particularly with Japanese documents.
- Mixed Code Page Content- Despite using automated detection and other robust tools, some documents will have a mixture of code page information such as when a custodian who cuts a paragraph out of a Shift-JIS email and pastes it into a Unicode-8 email. These mixed documents defy detection, but will be obvious to reviewers when chunks of the document appear garbled. Soliton can convert these documents to a single code page. While annoying for review, these documents undermine the validity for predictive coding when the garbled text fails to be recognized and contribute to the assisted review algorithms.
Once review of the documents completes, the parties send relevant documents to the opposing party. Traditionally, TIFF images and accompanying text files are provided, except for file types such as Excel Spreadsheets and Power Point files which tend to be produced in native format.
- Garbled produced text- Although some document incompatibilities will be resolved during the processing phase, additional documents may fail to convert properly to PDF due to file errors, font incompatibility or other issues. This is particularly prevalent among Asian documents so extra QC of the production documents is required.
- Translation- Although not required for all cases, others may require documents to be produced along with translated versions. Full translation by humans presents an enormous burden to the end client because of the potentially huge expense, but through negotiation and the use of technology, the translation process can be managed. Soliton recommends mixing machine translation (using enhanced glossaries) and human translators to dramatically reduce the cost and increase the speed.
As the case moves to the court room, the attorneys present the required data as evidence. If all of the other exceptions have been managed well up to this point, there will be no remaining issues inherent to the data other than handling the linguistic challenges