Court Addresses Production of Metadata in Great Detail and Grants Production of Some but Not All Data Sought
Aguilar v. Immigration & Customs Enforcement Div. of U.S. Dep’t of Homeland Sec., 255 F.R.D. 350 (S.D.N.Y. 2008)
In this class action case alleging unlawful searches and seizures of plaintiffs’ homes, a discovery dispute arose regarding the production of metadata. The court granted in part and denied in part plaintiffs’ request for the production of metadata for several types of electronically stored information (“ESI”) including email, word and excel documents, and databases.
On January 18, 2008, the parties agreed to undertake some discovery despite defendants’ pending motion to dismiss. About that time, defendants began to collect relevant materials from its employees. Plaintiffs served their first requests for production on February 15, 2008 but failed to address the form of production or metadata. The issue was first mentioned by plaintiffs on March 18, 2008 but only “in passing.” By this time, defendants had completed most of their collection efforts. On March 22, 2008 plaintiffs requested the production of emails and other ESI in Tagged Imaged File Format (“TIFF”) with corresponding load files containing metadata fields and extracted text and that spreadsheets and databases be produced in native format. The parties conferred on July 1, 2008 to discuss the format of production of ICE’s hierarchical databases. On July 14, 2008, defendants objected to the production of ESI in the forms requested by plaintiffs on the grounds of relevance and burden, and proposed production in the form of searchable PDF instead. Defendants also stated they would provide metadata for a particular document only where plaintiffs could demonstrate its relevance to their claims. Despite several attempts, the parties were unable to reach agreement. Thus, it fell to the court to address plaintiffs’ requests.
The court first addressed the more general topic of what exactly metadata is and its general uses in litigation. In its discussion, the court identified three types of metadata: substantive metadata, system metadata, and embedded metadata. Substantive metadata is “created as a function of the application software used to create the document or file” and reflects modifications to the document such as prior edits or editorial comments. System metadata “reflects information created by the user or by the organization’s information management system” and includes data concerning author, date and time of creation and the date a document was modified. Embedded metadata consists of “text, numbers, content, data, or other information that is directly or indirectly inputted into a [n]ative [f]ile by a user and which is not typically visible to the user viewing the output display.” Examples of embedded metadata include spreadsheet formulas, hyperlinks, and database information.
Turning to the topic of the discoverability of metadata, the court first analyzed how metadata is addressed by the Federal Rules of Civil Procedure. Initially, the court noted that “[m]etadata is not addressed directly in the Federal Rules of Civil Procedure but is subject to the general rules of discovery. Metadata thus is discoverable if it is relevant to the claim or defense of any party and is not privileged.” The court went on to point out that the discovery of metadata would be subject to the balancing test of Rule 26(b)(2)(C) requiring a court to weigh the probative value against the potential burden of production.
Turning to Rule 34, the court indicated the importance of the rule’s requirements for the production of ESI and the resulting implications for metadata. The court highlighted the instruction in the advisory committee notes that ESI kept in an electronically searchable form “should not be produced in a form that removes or significantly degrades this feature” and cited case law holding that “documents stripped of metadata allowing searches do not comply with Rule 34(b).”
The court then addressed The Sedona Principles and their advice regarding metadata. According to the court, The Sedona Principles indicate that when selecting a form of production (as addressed by Rule 34) “the two ‘primary considerations’ should be the need for and the probative value of the metadata, and the extent to which the metadata will ‘enhance the functional utility of the electronic information” (i.e., whether the metadata is relevant and whether it will assist/enhance the utility of the documents, including the ability to search them). The Sedona Principles also address the various production options and conclude that “even if native files are requested, it is sufficient to produce [ESI] in PDF or TIFF format accompanied by a load file containing searchable text and selected metadata.”
Turning, finally, to case law, the court outlined the holdings of several cases and concluded that “[t]here is a clear pattern in case law concerning motions to compel the production of metadata. Courts generally have ordered the production of metadata when it is sought in the initial document request and the producing party has not yet produced documents in any form.”
Noting that, in light of these principles, plaintiffs faced “an uphill battle,” the court then addressed each type of ESI for which metadata was requested. Beginning with email, the court found that due to plaintiffs’ delay in their request and in light of the small volume of emails at issue, it would not require defendants to search for metadata in all the files they had gathered. However, because defendants previously indicated a willingness to provide the metadata for emails that had been collected with metadata intact, the court ordered them to do so.
Regarding email metadata from back-up tapes, the court relied upon Rule 26(b)(2)(B) allowing parties to avoid production of information “not reasonably accessible” due to burden or cost. The court determined that the cost of such discovery was “unquestionably high,” the likely benefit low and declined to order production.
The court then addressed plaintiffs’ request for system metadata for all Word and PowerPoint documents. Plaintiffs argued that the data was both necessary for efficient searching of the documents and relevant. The court rejected the efficiency argument in light of the small volume of documents at issue. As to relevancy, the court concluded that the potential value was “likely outweighed” by the burden of production. Nonetheless, the court granted the motion on the condition that plaintiffs bear all costs associated with the production.
Regarding Excel spreadsheets, the court acknowledged the potential usefulness of metadata from a particularly complex spreadsheet but indicated that the spreadsheets at issue did not rise to that level. The court also noted that plaintiffs failed to show any indication of fraudulent modification of the spreadsheets and doubted the relevance of the metadata sought. Nonetheless, based upon defendants’ prior indication of willingness to produce spreadsheets in their native format, the court ordered them to do so.
Turning finally to plaintiffs’ request for “meaningful information” regarding defendants’ hierarchical databases, the court denied in part and granted in part plaintiffs’ request. The court denied access to one of defendants’ databases but ordered defendants to provide plaintiffs with a live demonstration of two others containing “subject records” and incident reports where plaintiffs sought to determine what changes were made to that information and when. The court had previously ordered such a demonstration, but it was delayed by plaintiffs’ difficulties in retaining a consulting expert. Recognizing the importance of having an expert to guide the inquiry, the court nonetheless found that failure to retain such an expert would not warrant further extension and ordered the demonstration undertaken by December 12, 2008.