Rapid increases in e-cigarette use and potential exposure to harmful byproducts have shifted Rabbit Polyclonal to GPR152. public health focus to e-cigarettes as a possible drug of abuse. each task. Our results demonstrate excellent classifier performance of up to 0.90 and 0.94 area under the curve in each category. These promising initial results form the foundation for further studies to realize the ideal surveillance solution. 1 Introduction 1.1 E-cigarettes The use of e-cigarettes has been rapidly increasing since their introduction onto the market a Odanacatib (MK-0822) few years ago. Sales of e-cigs and refillable vaporizers more than doubled to $1.7 billion in 2013.[1] Indeed the trend has become so popular that ‘vape’ was voted word of the year for 2014 by the Oxford Dictionaries.[2] A limited yet growing body of literature suggests that e-cigarettes and vaporizers can create potentially harmful byproducts including heavy metals[3] and formaldehyde [4] and product failure can result in severe injury and burns. Very little is known however regarding the use prevalence and characteristics of e-cigarettes. Two surveys among youth have indicated rapid increases in use since 2011 [5] and recent results from the 2014 Monitoring the Future survey indicated that 17% of 12th graders have used an e-cigarette in the past 30 Odanacatib (MK-0822) days surpassing the number who used combustible cigarettes.[6] Even less information on adult use exists with the only national data being one consumer-research web survey [7] indicating that 8.5% of adults have tried e-cigarettes with a rate of 36% among combustible cigarette users. No large-scale surveys have yet assessed more in-depth opinions about e-cigarette use such as reasons for use or beliefs about harm. 1.2 Surveillance Survey results are necessary to understand usage trends establish national and regional health goals and inform regulations and prevention campaigns. These surveys – while excellent in many ways – have several limitations. First there is a time lag before new products of abuse are incorporated into the surveys.[8] For example neither the BRFSS [9] the National Health Interview Survey [10] nor the National Survey on Drug Use and Health Odanacatib (MK-0822) (NSDUH)[11] ask about e-cigarette use yet. Second the time lag in collection and analysis may delay timely policy interventions. Third the surveys are sized to capture general trends across demographics and may lack focus for specific populations. Fourth surveys have limitations in detecting usage by minors as most are not allowed to take the surveys. Fifth surveys may contain limited content for any specific question as every additional question competes against other questions for time and space in the survey. Sixth surveys capture high level geo-located information of use. Continuing use of high-quality national surveys to inform prevention and treatment services is critical yet new technologies may address some of these Odanacatib (MK-0822) limitations. An ideal surveillance solution could capture new drugs of abuse collect data in real time focus on populations of interest include populations unable to take the survey allow a breadth of questions to answer and enable geo-location analysis. We believe that social media streams may provide one solution. Social media in this case specifically Twitter may include up to date vernacular for drugs of abuse is inherently real time in how Tweets are broadcast includes many potential populations of interest and their demographic characteristics has populations such as minors who may not qualify for surveys contains Tweets that indicate other potentially risky behaviors and includes geo-locations. To realize using social media for surveillance a foundational question is whether we can detect drug use at all. This work addresses this foundational concern and reports two pilot tasks for e-cigarettes. In the first we identify automatically e-cigarette Tweets that indicate e-cigarette use. In the second we identify automatically Tweets that indicate e-cigarette use for smoking cessation. 1.3 Our Contribution This feasibility paper explores state of the art machine learning based text classification methodologies for identifying e-cigarette use tweets. This paper makes several key contributions: Defines a novel classification task for identifying e-cigarette use. Defines a novel classification task for identifying e-cigarette use for smoking cessation. Defines a process Odanacatib (MK-0822) for labeling tweets to identify e-cigarette use and use for.