Protect Act


Our group works with the Massachusetts and Pennsylvania Internet Crimes Against Children (ICAC) Task Forces in several ways. Our primary work has been research that assists in investigations of the sharing and distribution of images of child sexual exploitation. Through a collaborative grant with Clay Shields (Georgetown Univ.) from the National Institute of Justice, we are designing the next generation of technologies that address critical aspects of forensic soundness, evidence collection, and statistical analysis.

Our program has been highlighted in the Dept. of Justice's annual report to congress on child exploitation on the Internet.

Our research contributions have focused on the challenges of peer-to-peer networking specifically. Relevant papers include:

  • Forensic Investigation of the OneSwarm Anonymous Filesharing System.
    Swagatika Prusty, Brian Neil Levine, and Marc Liberatore Marc Liberatore. To appear in Proc. ACM Conference on Computer and Communications Security (CCS 2011), October 2011. [PDF]

    Abstract: OneSwarm is a system for anonymous p2p file sharing in use by thousands of peers. It aims to provide Onion Routing-like privacy and BitTorrent-like performance. We demonstrate several flaws in OneSwarm's design and implementation through three different attacks available to forensic investigators. First, we prove that the current design is vulnerable to a novel timing attack that allows just two attackers attached to the same target to determine if it is the source of queried content. When attackers comprise 15% of OneSwarm peers, we expect over 90% of remaining peers will be attached to two attackers and therefore vulnerable. Thwarting the attack increases OneSwarm query response times, making them longer than the equivalent in Onion Routing. Second, we show that OneSwarm's vulnerability to traffic analysis by colluding attackers is much greater than was previously reported, and is much worse than Onion Routing. We show for this second attack that when investigators comprise 25% of peers, over 40% of the network can be investigated with 80% precision to find the sources of content. Our examination of the OneSwarm source code found differences with the technical paper that significantly reduce security. For the implementation in use by thousands of people, attackers that comprise 25% of the network can successfully use this second attack against 98% of remaining peers with 95% precision. Finally, we show that a novel application of a known TCP-based attack allows a single attacker to identify whether a neighbor is the source of data or a proxy for it. Users that turn off the default rate-limit setting are exposed. Each attack can be repeated as investigators leave and rejoin the network. All of our attacks are successful in a forensics context: Law enforcement can use them legally ahead of a warrant. Furthermore, private investigators, who have fewer restrictions on their behavior, can use them more easily in pursuit of evidence for such civil suits as copyright infringement.

  • Strengthening Forensic Investigations of Child Pornography on P2P Networks.
    Marc Liberatore, Brian Neil Levine, and Clay Shields. Proc. ACM Conference on emerging Networking EXperiments and Technologies (CoNEXT 2010), November 2010. [PDF]

    Abstract: Measurements of the Internet for law enforcement purposes must be forensically valid. We examine the problems inherent in using various network- and application-level identifiers in the context of forensic measurement, as exemplified in the policing of peer-to-peer file sharing networks for sexually exploitative imagery of children (child pornography, or CP). First, we present a characterization of measurements of these networks, including large-scale measurements performed in the law enforcement context. We then show how the identifiers in these measurements can be unreliable, and propose the tagging of remote machines. Our proposed tagging method marks remote machines by providing them with application- or system-level data that is valid, but covertly has meaning to investigators. This tagging allows investigators to link network observations with physical evidence in a legal, forensically strong, and valid manner. We present a detailed model and analysis of our method, show how tagging can be used in several specific applications, discuss the general applicability of our method, and detail why the tags are strong evidence of criminal intent and participation in a crime.

  • Forensic Investigation of Peer-to-Peer File Sharing Networks.
    Marc Liberatore, Robert Erdely, Thomas Kerle, Brian Neil Levine, and Clay Shields. In Proc. DFRWS Annual Digital Forensics Research Conference, August 2010. [PDF]

    Abstract: The investigation of peer-to-peer (p2p) file sharing networks is now of critical interest to law enforcement. P2P networks are extensively used for sharing and distribution of contraband. We detail the functionality of two p2p protocols, Gnutella and BitTorrent, and describe the legal issues pertaining to investigating such networks. We present an analysis of the protocols focused on the items of particular interest to investigators, such as the value of evidence given its provenance on the network. We also report our development of RoundUp, a tool for Gnutella investigations that follows the principles and techniques we detail for networking investigations. RoundUp has experienced rapid acceptance and deployment: it is currently used by 52 Internet Crimes Against Children (ICAC) Task Forces, who each share data from investigations in a central database.

This work is sponsored in part by the National Institute of Justice (award 2008-CE-CX-K005) and the National Science Foundation (CNS-1018615).

Mobile Systems Forensics

We have a few projects addressing the problems mobile systems pose to forensic investigation. Our most recent effort, a system called DEC0DE, attempts to quickly extract important information from mobile phones even if the particular phone model has not been seen before. DEC0DE uses a combination of machine learning techniques to locate and extract records, such as call logs and address book entries, in the raw internal storage of the phone. Prior to DEC0DE, we developed a novel reverse engineering technique, Ares, for the analysis of data recovered from mobile and embedded systems. We use a data-driven approach that incorporates natural language processing techniques to infer the layout of input data that has been created according to some unknown specification. Our work is in contrast to past reverse engineering techniques based on instrumentation of executables that offer high accuracy but are hard to apply to custom phone architectures.

  • Forensic Triage for Mobile Phones with DEC0DE. Robert J. Walls, Erik Learned-Miller, and Brian Neil Levine. In Proc. USENIX Security Symposium, August 2011. [PDF]

    Abstract: We present DEC0DE, a system for recovering information from phones with unknown storage formats, a critical problem for forensic triage. Because phones have myr- iad custom hardware and software, we examine only the stored data. Via flexible descriptions of typical data struc- tures, and using a classic dynamic programming algo- rithm, we are able to identify call logs and address book entries in phones across varied models and manufactur- ers. We designed DEC0DE by examining the formats of one set of phone models, and we evaluate its performance on other models. Overall, we are able to obtain high performance for these unexamined models: an average recall of 97% and precision of 80% for call logs; and average recall of 93% and precision of 52% for address books. Moreover, at the expense of recall dropping to 14%, we can increase precision of address book recovery to 94% by culling results that don't match between call logs and address book entries on the same phone.

  • Reverse Engineering for Mobile Systems Forensics with Ares.
    John Tuttle, Robert J. Walls, Erik Learned-Miller, and Brian Neil Levine. In Proc. ACM Workshop on Insider Threats, October 2010. [PDF]
  • Abstract: We present Ares, a reverse engineering technique for assisting in the analysis of data recovered for the investigation of mobile and embedded systems. The focus of investigations into insider activity is most often on the data stored on the insider’s computers and digital devices — call logs, email messaging, calendar entries, text messages, and browser history — rather than on the status of the system’s security. Ares is novel in that it uses a data-driven approach that incorporates natural language processing techniques to infer the layout of input data that has been created according to some unknown specification. While some other reverse engineering techniques based on instrumentation of executables offer high accuracy, they are hard to apply to proprietary phone architectures. We evaluated the effectiveness of Ares on call logs and contact lists from ten used Nokia cell phones. We created a rule set by manually reverse engineering a single Nokia phone. Without modification to that grammar, Ares parsed most phones’ data with 90% of the accuracy of a commercial forensics tool based on manual reverse engineering, and all phones with at least 50% accuracy even though the endianess for one phone changed.

Anonymous Communication Systems

Since 2000, We have looked at several attacks that break protocols designed to provide Internet anonymity such as Tor. For example, we have looked in great detail at the predecessor, intersection, and Sybil attacks. These methods use only statistical information about other proxies in the network and not the traffic they send. Notably, our work on the predecessor attack was the first analysis to quantitatively compare all known protocols for anonymous communication.

More recently, we have examined attacks on encrypted traffic and VOIP streams. We have shown that it is possible to identify the source of encrypted Web traffic (whether via WPA, SSL, SSH, or TOR) with reasonable accuracy when a limited number of Internet destinations are involved. Moreover, our approach is robust to traffic shaping that attempt to defend against such analysis. Similarly, we have examined attacks on VOIP streams over anonymous systems based on actual traffic measurements from planetlab.

Selected works:

Past Projects

Database Forensics and Transparency

We have examined several database systems to determine the amount of digital artifacts that remain as trace evidence during normal operations. And we have proposed principles for keeping databases transparent, that is, without unexpected trace artifacts.

Digital Evidence Provenance

Brian Neil Levine and Marc Liberatore. In Proc. of DFRWS Annual Conference, August 2009. [pdf]

The current standard and open formats for forensic data describe whole disk and memory image properties, but do not describe the products of detailed investigations. In this project, we propose a simple canonical description of digital evidence provenance that explicitly states the set of tools and transformations that led from acquired raw data to the resulting product. Our format, called Digital Evidence Exchange (DEX) is independent of the forensic tool that discovered the evidence, which has a number of advantages. Using a DEX description and the raw image file, evidence can be reproduced by other tools with the same functionality. Additionally, DEX descriptions can identify differences between two separate investigations of the same raw evidence. Finally, as a standard product of tools, DEX can allow quick fabrication of tool chains either as best of breed amalgams or for tool testing. We have implemented DEX as an open source library.

The DEX source code is available in tar.bz2 or zip archive format.

DEX is written in JAVA and is released under a BSD-like license. Currently, we provide a core library and wrappers for fdisk, mmls, istat, icat, exiftool, and JHead. We welcome your comments and feedback!

This work was supported in part by National Science Foundation award DUE-0830876 and in part by National Institute of Justice Award 2008-CE-CX-K005.

Top Back to top