DARKFLEECE: Probing the Dark Side of Android Subscription Apps

Chang Yue

1,2

, Chen Zhong

, Kai Chen

1,2

*, Zhiyu Zhang

1,2

, and Yeonjoon Lee

Institute of Information Engineering, Chinese Academy of Sciences, China

School of Cyber Security, University of Chinese Academy of Sciences, China

University of Tampa, USA

Hanyang University, Ansan, Republic of Korea

{yuechang, chenkai, zhangzhiyu1999}@iie.ac.cn, [email protected], [email protected]

Abstract

Fleeceware, a novel category of malicious subscription apps,

is increasingly tricking users into expensive subscriptions,

leading to substantial ﬁnancial consequences. These apps’

ambiguous nature, closely resembling legitimate subscrip-

tion apps, complicates their detection in app markets. To

address this, our study aims to devise an automated method,

named DARKFLEECE, to identify ﬂeeceware through their

prevalent use of dark patterns. By recruiting domain ex-

perts, we curated the ﬁrst-ever ﬂeeceware feature library,

based on dark patterns extracted from user interfaces (UI).

A unique extraction method, which integrates UI elements,

layout, and multifaceted extraction rules, has been devel-

oped. DARKFLEECE boasts a detection accuracy of 93.43%

on our dataset and utilizes Explainable Artiﬁcial Intelligence

(XAI) to present user-friendly alerts about potential ﬂeece-

ware risks. When deployed to assess Google Play’s app land-

scape, DARKFLEECE examined 13,597 apps and identiﬁed

an alarming 75.21% of 589 subscription apps that displayed

different levels of ﬂeeceware, totaling around 5 billion down-

loads. Our results are consistent with user reviews on Google

Play. Our detailed exploration into the implications of our

results for ethical app developers, app users, and app market

regulators provides crucial insights for different stakeholders.

This underscores the need for proactive measures against the

rise of ﬂeeceware.

1 Introduction

The integration of subscription models into mobile apps has

witnessed signiﬁcant growth in recent years. Subscription

revenue has grown from around $1.5 billion in 2015 to $17.1

billion in 2022 [12]. However, the popularity of subscrip-

tions has led to an increase in their abuse. Deceptive mo-

bile apps, referred to as ﬂeeceware [52], have been a per-

vasive issue in the mobile app market. These apps incorpo-

rate dark design patterns, such as unclear terms, to deceive

The corresponding authors.

users into subscribing to overly expensive services. Users

may subscribe to apps without their consent, fail to realize

they will be charged after a free trial, or struggle to cancel

a subscription. Recent reports have highlighted subscription

scams [14, 30, 47, 50, 58]. According to Avast, 69 ﬂeece-

ware apps generated $38.5 million in revenue in 2019 [4].

Given its widespread prevalence, automated detection of

ﬂeeceware is imperative to minimize its harmful effects.

However, these apps typically do not contain malicious code

and exhibit coding patterns that closely resemble legitimate

apps, making it difﬁcult to detect them using traditional

methods such as malicious code detection. Presently, app

markets largely lean on monetization requirements as a reg-

ulatory tool to combat ﬂeeceware [28]. However, due to the

absence of automated detection methods, the process is man-

ual, which is neither timely nor scalable. This work is ded-

icated to investigating ﬂeeceware at scale through the devel-

opment of an automatic ﬂeeceware detection system.

Current observations reveal that ﬂeeceware often employs

dark patterns [10, 50], which are deceptive interface designs

intended to manipulate users into paying subscription fees

beyond their initial intention. Therefore, we mainly focus on

identifying these dark patterns in UIs for ﬂeeceware detec-

tion. An app with dark pattern features doesn’t automatically

qualify as ﬂeeceware, so we utilize a model that combines

these features for detection. Our results are presented in a

user-friendly manner, making it easy for users to understand.

To effectively pinpoint ﬂeeceware, we developed DARK-

FLEECE, a Dark Pattern-based Fleeceware Detector, which

meticulously scrutinizes the UI design of subscription-based

apps to uncover any embedded dark patterns. To realize

DARKFLEECE, we mainly addressed two challenges.

C1: Constructing a Fleeceware Feature Library. Imple-

menting an automatic ﬂeeceware detection system requires

recognizing the distinct features of ﬂeeceware. There is a

lack of research on identifying detectable features that can

accurately describe ﬂeeceware, and the availability of a well-

labeled dataset for ﬂeeceware is limited. Despite extensive

research on dark patterns, the automated detection of ﬂeece-

ware remains elusive due to the complexities and variabilities

intrinsic to ﬂeeceware patterns. Drawing a clear line between

ﬂeeceware and legitimate apps is far from straightforward.

To address C1, we created a feature collection for ﬂeece-

ware detection. This endeavor required us to draw heavily

from the extensive knowledge presented in dark pattern lit-

erature, predominantly in the domains of human-computer

interaction (HCI) and user experience (UX) design, as well

as platform-speciﬁc requirements. We invited ten experts in

Android and front-end development. They analyzed 1,486

user comments on ﬂeeceware, identiﬁed seven common com-

plaints, and executed 79 ﬂeeceware samples, uncovering var-

ious phenomena in subscription UI. Finally, we extracted en-

tities and attributes from the phenomena and identiﬁed 19 UI

features for ﬂeeceware detection.

C2: Extracting Features in the form of Natural Lan-

guage. The most intricate aspect of the detection is extract-

ing the values of the identiﬁed features from an app’s UI.

Most of our features are based on the information in the form

of natural language, i.e., text describing a type of subscrip-

tion information. However, the subscription information in a

UI is usually fragmented and expressed in multiple formats

due to the vast variability inherent in UI design, encompass-

ing aspects like layouts, nested elements, and non-textual

components. The extraction process requires a deep under-

standing of the relationship between the UI elements and the

overall layout, as well as the capability to handle subscrip-

tion information displayed in diverse formats.

To address C2, we developed a method that factors in both

the UI elements and their layout information, incorporating

multiple extraction rules. This method allows us to derive

necessary information from the UI. Speciﬁcally, we devel-

oped a novel layout-based approach to link related subscrip-

tion information by identifying neighboring widgets, which

are descendant widgets of a parent widget like “ViewGroup”

or “RelativeLayout” in the layout ﬁle. Additionally, we

adopted a multi-rule-based approach to extract target infor-

mation, avoiding potential semantic misunderstandings as-

sociated with machine learning. This approach is inspired

by our analysis of 145 subscription UIs, where we observed

developers consistently using speciﬁc keywords or symbols

to describe subscription information, with numerical values

consistently positioned within the information. We then con-

structed a collection of regular expressions to extract sub-

scription details in various forms. This combined approach

accurately extracts the target information for each UI feature.

With C1 and C2 addressed, we built a decision tree model

for ﬂeeceware detection. We chose this model because of its

ability to incorporate our domain knowledge during feature

engineering and its inherent model interpretability. To make

it more intuitive for ordinary users to identify which inter-

face features may present potential ﬂeeceware risks, we em-

ployed SHAP, an Explainable Artiﬁcial Intelligence (XAI)

technique, to provide and visualize the explanations.

Fleeceware Measurement. Owing to our automated detec-

tion capabilities, we are able to quantify the presence of

ﬂeeceware in the wild. We downloaded 13,597 apps from

Google Play, spanning across all app categories, and iden-

tiﬁed 589 subscription-based apps. Within an approximate

time frame of 10.12 minutes per app, we classiﬁed 443 of

these apps, representing 75.21%, as suspected ﬂeeceware,

which may potentially cause unanticipated subscription costs

for users. These apps have collectively garnered over 5 bil-

lion downloads. Even popular apps like YouTube Music, with

over 1 billion downloads, contain ﬂeeceware subscription

UIs, indicating that the issues are signiﬁcant. Our detec-

tion results are consistent with users’ reviews on Google Play.

With the detection results, we reported our ﬁndings through

Google’s app policy violation reporting platform to bring the

problems to their attention.

Contributions. To sum up, the contributions are three-fold.

• We construct the ﬁrst UI feature library for ﬂeeceware

detection, which is the result of collaboration with domain

experts who possess in-depth knowledge of user interac-

tions with ﬂeeceware behaviors. They meticulously analyzed

ﬂeeceware samples and integrated insights from studies on

dark patterns and platform-speciﬁc UI design requirements.

• We develop a novel technique that merges layout-based in-

formation linking with a multi-rule-based information extrac-

tion method for efﬁcient subscription information extraction.

This innovation stems from an extensive examination of sub-

scription UIs, unveiling a consistent placement of relevant

information within speciﬁc layout patterns and summarizing

diverse presentation formats into common patterns.

• We assess the prevalence of ﬂeeceware throughout the app

market and uncover its extensive distribution. Moreover, we

investigate the developers, evolution, and app user percep-

tions of ﬂeeceware, offering valuable insights to ethical de-

velopers, app users, and app market managers to mitigate its

detrimental impact.

2 Background and Motivation

2.1 Subscription and Fleeceware

Subscription is a type of product offered in in-app purchase

billing that allows app developers to sell content, services, or

features through their apps and automatically charge users

at regular intervals speciﬁed, e.g., a week or a month. To

attract users, developers often set up a free trial period, al-

lowing users to try out the subscription before purchasing.

And users can cancel subscriptions before the billing cycle

ends [27]. Figure 1 shows a subscription UI example, which

typically displays information such as the subscription price,

billing cycle, free trial details, etc.

Between 2015 and 2022, subscription revenue increased

from around $1.5 billion to $17.1 billion [12]. As subscrip-

tions gain popularity, they also become vulnerable to abuse

Subscription Plan

(①Price ($19.99) and ②Billing Cycle (1 month))

③ Free Trial

(3 Days free trial)

④ Fee After Free

($74.99 yearly after trial)

⑤ Auto-renewed

(subscription will be

auto-renewed ...)

⑥ Subscription Management

(You can manage or turn off

subscription at …)

Figure 1: An example of a subscription interface

Table 1: Google requirements for subscription UI design

No. Requirements for Preventing Subscription Abuse

Be transparent about the offer, including the offer terms, the

cost of the subscription, the frequency of the billing cycle, and

whether a subscription is required to use the app.

Users should not have to perform any additional action to review

the subscription information.

The content should accurately convey the meaning of the sub-

scription. An example of a violation is “Free Trial” or “Try Pre-

mium membership - 3 days for free” for a subscription with an

auto-recurring charge.

Show how and when a free trial will convert to a paid subscrip-

tion, how much the subscription will cost, and whether a user

can cancel if they do not want to convert to a paid subscription.

by developers. Fleeceware [14] is a type of subscription app

that deceives users into incurring unclear or hidden charges.

Users may subscribe to apps without their consent, fail to

realize they will be charged after a free trial, or struggle to

cancel a subscription. Avast reported that the 204 ﬂeeceware

apps (70 in Android and 134 in iOS) they identiﬁed in 2021

have brought in 403.5 million dollars in revenue [4].

2.2 App Markets Regulatory Mechanisms

Ensuring effective oversight of apps on platforms like

Google Play and AppStore poses signiﬁcant challenges. The

typical strategy is to employ a rating and review system

to compile feedback from users. App reviews provide

valuable information, including bug reports and feature re-

quests [43]. App markets also use various vetting mecha-

nisms, such as analyzing information uploaded by the app

developer [49, 55], static scanning [60, 64], and dynamic ex-

ecution [11, 25], to detect common malware. In addition,

researchers have developed machine learning-based methods

to detect new types of malware [17, 35, 41, 61]. However, the

majority of these methods are ineffective at detecting ﬂeece-

ware as ﬂeeceware mainly utilizes hidden or contradictory

content on the UI to confuse users, and exhibit coding pat-

terns that closely resemble legitimate apps, without resorting

Table 2: Dark patterns

Category Deﬁnition

Nagging

Redirection of expected functionality that persists be-

yond one or more interactions

Obstruction

Making a process more difﬁcult than it needs to be,

with the intent of dissuading certain action(s).

Sneaking

Attempting to hide, disguise, or delay the divulging

of information that is relevant to the user.

Interface

Interference

Manipulation of the user interface that privileges certain

actions over others.

Forced Action

Requiring the user to perform a certain action to ac-

cess (or continue to access) certain functionality.

to traditional malicious behaviors, such as stealing conﬁden-

tial data and causing system crashes.

Currently, app markets rely primarily on monetization re-

quirements as the regulatory mechanism to combat ﬂeece-

ware. For example, Google mandates that subscription app

developers must not mislead users about subscription ser-

vices or content offered within the app and provides spe-

ciﬁc guidelines for the development of subscription inter-

faces [28], as outlined in Table 1. However, relying solely

on regulations is not sufﬁcient to prevent the occurrence of

ﬂeeceware. In addition, due to the absence of automated de-

tection methods, these markets are dependent on manual app

reviews, a process that is neither timely nor scalable. It is

crucial to identify the features which can indicate ﬂeeceware,

and build an effective tool for detecting them.

2.3 Dark Patterns Observed in Fleeceware

Fleeceware is a form of subscription fraud that is found to

make use of dark patterns. To develop an automated ﬂeece-

ware detection system, it is critical to employ the knowledge

from dark pattern studies. Dark patterns have been exten-

sively studied [10, 19, 21, 29, 40] in the human-computer

interaction (HCI) and user experience (UX) ﬁelds. They are

carefully crafted, deceptive interface design patterns that ma-

nipulate users into taking actions or making decisions they

did not intend to make [10]. Gray et al. [29] have proposed

a comprehensive categorization of dark patterns, including

nagging, obstruction, sneaking, interface interferences, and

forced action, as shown in Table 2.

In ﬂeeceware, several dark patterns are observed. For ex-

ample, the interface intentionally hides subscription-related

information (Sneaking), causing users to unknowingly com-

plete a subscription. The interface prominently highlights

the “free trial” while making the subsequent payment infor-

mation difﬁcult to notice using small font sizes and incon-

spicuous colors (Interface Interference), which gives the im-

pression that the app will not automatically charge fees. In

sum, ﬂeeceware leverages dark patterns to orchestrate sub-

scription app scams, with most of these dark patterns observ-

able in the subscription UI, leading to users’ ﬁnancial detri-

ment. This observation motivates us to detect ﬂeeceware by

identifying these dark patterns within the subscription UI.

Fleeceware

Feature

Library

Detector

User

Alerts, Explanations

Feedback

Evaluator

Results

Updates

App to

detect

screenshot,

layout files

Feature Extractor

UI Collctor

Layout based Information Linking

Multi-rule based Information

Extraction

Domain

Experts

Dark Pattern Literature Review

Knowledge-Based Feature Construction

Recruiting Domain Experts

(Sceening based on

knowledge on Dark Patterns

and Mobile App UI)

Review Platform

Requirements

Review User Complaints

Dynamic analysis of

fleeceware samples

Collect User Complaints

Classifer XAI

Feature Selection

Preference

Automated UI Collection

Keywords-based Filter

? Fleeceware Feature Library Construction ? Fleeceware Detection

Figure 2: Framework of the ﬂeeceware detection system.

3 DARKFLEECE

3.1 Threat Model

Our system is speciﬁcally designed to address a growing con-

cern in the landscape of subscription apps: the use of decep-

tive UI designs (i.e., dark patterns) by app developers. These

practices exploit gaps in user awareness and cognitive biases,

leading to unwanted subscriptions and ﬁnancial losses for

users. To address these ﬂeeceware problems, our work fo-

cuses on such deceptive design detection, and subscription

UIs with such design are labeled as ﬂeeceware UIs. Our aim

is to highlight unethical design practices resembling ﬁnan-

cial fraud, contributing to a safer digital environment.

3.2 Overview

We introduce DARKFLEECE (Dark Pattern-based Fleece-

ware Detector), which methodically examines the UI of sub-

scription apps to detect any existing dark patterns. The sys-

tem framework is depicted in Figure 2. Feature Library

is a key component, as the effectiveness of the system can

greatly depend on its quality. This library is constructed by

domain experts, drawing from observations of user interac-

tions with ﬂeeceware behaviors and a review of ﬂeeceware

samples, and integrates insights from dark pattern studies

and platform-speciﬁc app requirements (address C1). DARK-

FLEECE then utilizes a UI Collector to collect subscription

UIs from an app, and uses this feature library in its Feature

Extractor to draw out pertinent feature values from the UIs.

Within the extractor, we address the second challenge (C2)

using a novel layout-based information linking technique

and a multi-rule-based information extraction method to har-

vest target subscription information and obtain the feature

values. Then a Detector module applies a trained classiﬁer

to detect potential ﬂeeceware and utilizes an Explainable Ar-

tiﬁcial Intelligence (XAI) technique to provide explanations

for the outcomes to make it easier for users to notice and un-

derstand the issues on the UI. As users provide feedback, an

Evaluator module continually optimizes and updates the fea-

ture library to meet user needs. We will provide a detailed

Table 3: Typical ﬂeeceware behaviors users complain about

No. Behaviors Reviews from Google Play

Claim to be free but actually

not.

It says 3 days free trial, but

when you press “continue” it

makes you pay.

Mislead users to subscribe

without their knowledge.

I cannot believe I did not sub-

scribe to this and they took

£80.00 from my account.

Make users confused about

the charge plan.

Said it was $6.99/mo. Then it

charged me $79.99 for a year.

Do not state the recurring

charge.

They billed me in June and

again in July.

Make users confused about

canceling subscription.

I had uninstalled and canceled

the subscription. I was still

charged!

Hard to close subscription UI. The developer blended in the

“X” in the corner so you can

barely see it.

The price is unreasonable. I would never pay almost

$100 A WEEK.

description of the speciﬁc design and implementation of each

component below.

3.3 Knowledge-Based Feature Construction

Fleeceware identiﬁcation necessitates a meticulous examina-

tion of ﬂeeceware behaviors along with a solid understand-

ing of dark patterns. Our ﬁrst step involves gathering data on

ﬂeeceware behavior by accumulating user observations. Us-

ing these observations as a basis, we enlist domain experts to

analyze user-observed behaviors. Additionally, experts an-

alyze and execute a set of ﬂeeceware to unearth any poten-

tial ﬂeeceware patterns that may not have been observed by

users. To accumulate user observations, we collected 1,486

pertinent comments on ﬂeeceware reported by Avast [4] from

Google Play. The complaints from users serve as a rich

source of information for documenting ﬂeeceware behaviors.

In the subsequent step, we engaged a panel of ten domain

experts. This group was composed of two contributing au-

thors of this study and eight additional volunteers, who were

carefully selected through our professional networks to en-

sure a diverse yet highly specialized skill set was represented.

The group consisted of four researchers and six engineers,

Table 4: The phenomena occurred in ﬂeeceware subscription UIs

No. Phenomena

Fleeceware

Behaviors

Violation of

Requirements

Category of

Dark Patterns

1 The entire interface lacks price information. B1, B2 R1, R2 sneaking

2 The entire interface lacks billing frequency information. B1, B2, B3 R1, R2 sneaking

3 If there is a free trial, the entire interface does not inform about the trial period. B1, B2 R2, R4 sneaking

4 The entire interface does not indicate whether the subscription will auto-renew. B4 R1, R2, R4 sneaking

5 If there is a free trial, it does not specify the charges after the trial. B1, B2 R4 sneaking

6 The total cost for a complete billing cycle is not provided. B3 R1 sneaking

7 The total charge is not highlighted when multiple price formats are presented. B3 R1 interface interference

A contradiction between the trial duration and the timing of the charge or cancellation

of the subscription. (e.g., claim a 3-day free trial but charge on the last day of the trial.)

B2, B5 R4 interface interference

The interactive buttons do not convey the meaning of subscription (e.g., lack of keyword

"subscribe" or not display the pricing information), and there is a signiﬁcant distance

between subscription information and the buttons.

B2 R1, R3

sneaking,

interface interference

The subscription information is not prominent (e.g., small font size, similar color to the

background, lack of emphasis in font style, buried within lengthy paragraphs).

B1, B2, B3 R1

sneaking,

interface interference

11 The close button is not clearly visible. B6 R1

obstruction,

forced action

The subscription charges are unreasonable after the trial (e.g., charges after the trial

signiﬁcantly surpass market prices or prices with a free trial are higher than those with-

out).

B7 - interface interference

each bringing over four years of substantial practical expe-

rience in both Android and front-end design and develop-

ment. These experts individually reviewed ﬂeeceware sam-

ples alongside the documented ﬂeeceware behaviors and met

twice to discuss them. During the initial meeting, the experts

summarized the documented ﬂeeceware behaviors using the

dark patterns in Table 3 to gain an understanding of the dis-

tribution of dark patterns in user observations. After the

meeting, to discover potentially unobserved dark patterns,

the experts further examined and ran 79 ﬂeeceware samples,

comprising 34 iOS apps and 45 Android apps. The results

were discussed in the second meeting, and in cases of dif-

fering results, the experts engaged in discussions to reach an

agreement. Remarkably, these ﬂeeceware samples did not

exhibit any notable anomalies or irregular behaviors during

run time, with the exception of certain apps that frequently

displayed advertising links. Nonetheless, the experts’ anal-

ysis did uncover various dark patterns and breaches of plat-

form requirements within these ﬂeeceware subscription UIs.

These ﬁndings, presented in Table 4, signify potential risk

areas where users could inadvertently miss or misinterpret

essential subscription information, mirroring problems out-

lined in Table 3.

Through expert analysis of the ﬂeeceware samples, we

have identiﬁed several crucial subscription details that may

display characteristics of dark patterns. These details in-

clude the subscription cost, billing cycle, free trial period,

and whether the subscription will automatically renew. It is

imperative that these critical pieces of information be explic-

itly stated as required by the platforms (refer to the require-

ments in Table 1), as they play a crucial role in enabling

users to make informed decisions regarding their subscrip-

tion choices. The lack of these subscription details is iden-

tiﬁed in the subscription interface of the ﬂeeceware samples

(No.1-No.4 in Table 4).

（a）

（b）

（d）

（c）

Figure 3: Examples of ﬂeeceware. 3(a) shows that the charge

information is not easily discernible due to the small font size

and color. 3(b) claims a 3-day trial, but users must cancel it

24 hours before the trial ends. 3(c) shows the price after the

free trial is unreasonably higher than the price for a one-time

payment, which reﬂects the characteristic of ﬂeeceware, i.e.,

excessive charging. 3(d) displays a monthly price for an an-

nual subscription, which violates Google’s policies and may

lead to user misunderstandings about the charges incurred.

Some ﬂeeceware samples employ tactics to conceal or

redirect users’ attention away from critical subscription in-

formation (No.5-No.10). This can result in users inadver-

tently overlooking or misunderstanding the subscription de-

tails provided. Figure 3(d) showcases an interface that only

presents a monthly price for a 12-month subscription (No.6),

potentially misleading users regarding the long-term costs

they will incur upon subscribing. Similarly, if a UI high-

lights the weekly price rather than the total price of the cur-

rent subscription (No.7), it potentially causes users to under-

estimate the actual charges involved. Figure 3(b) claims a

3-day free trial, and users can cancel the subscription at any

time. However, the ﬁne print reveals that users must cancel

one day before the trial ends to avoid automatic conversion

to a paid plan and charges (No.8), which increases the risk

of users missing the cancellation deadline and unintention-

ally incurring charges for the subscription. Interactive but-

tons are essential elements within a UI, serving as guides for

user actions and triggering speciﬁc functionalities or naviga-

tion. Thus, the information displayed on these buttons signif-

icantly impacts users’ understanding of their current actions.

However, if an interface only mentions “Continue to trial” on

the interactive button, while the information about automati-

cally transitioning to an annual subscription after the trial pe-

riod is displayed elsewhere (No.9). This design can mislead

users into believing they are engaging in a trial period with-

out realizing the subsequent charges. Font size, style, color,

and positioning of informational text on an interface greatly

inﬂuence users’ ability to quickly and clearly perceive cru-

cial information. In Figure 3(a), ﬂeeceware employs small

font sizes and colors that closely resemble the background,

diverting users’ attention from the billing information while

emphasizing the “3-Days Free Trial” aspect (No.10).

Additionally, we found subscription interfaces that allow

users to close the UI, implying that users can access the app

without subscribing. However, in certain cases, the icon to

close the UI is obscured or shares a color too similar to the

background, making it difﬁcult for users to notice (No.11),

potentially leading them to mistakenly believe that subscrip-

tion is the only way to continue using the app. Furthermore,

some ﬂeeceware samples demonstrate unreasonable pricing

practices on their subscription interfaces (No.12). One such

example is the app Magic icon changer, a wallpaper design

app, which charges users a staggering $129.9 per week fol-

lowing a free trial period. Figure 3(c) illustrates an instance

where the price after the free trial surpasses the cost of a one-

time payment, further highlighting the exorbitant and unjus-

tiﬁed pricing strategies employed by ﬂeeceware apps.

Finally, we consolidate the observed phenomena identi-

ﬁed by experts, extract the relevant entities and attributes,

and formalize them into 19 features, as outlined in Table 6

in the Appendix C. Each feature serves as an indicator for

speciﬁc aspects of the ﬂeeceware subscription UIs. Specif-

ically, Features F

-F

and F

-F

are utilized to determine

the presence of key subscription information for phenom-

ena No.1-No.5 and No.9, respectively. Feature F

is de-

rived from Phenomenon No.6 and signiﬁes the observation

that these interfaces often display more text describing the

billing frequency than the actual price. Phenomenon No.7 is

assessed using Feature F

, which calculates the ratio of the

font size of the total price to other prices, as well as examines

whether the font is bold, indicating if the total price informa-

tion is adequately highlighted. To determine the presence of

Phenomenon No.8, we utilize F

, which compares whether

these two time are consistent. To evaluate the proximity be-

tween the subscription information and the interactive button

in Phenomenon No.9, we employ Feature F

, which mea-

sures the relative vertical location of these elements on the

interface. For Phenomenon No.10, Feature F

takes into

account the relative length, relative font size, and font style

of the price information compared to other text, assessing

the visibility and noticeability of the information. Further-

more, Features F

-F

utilize optical character recognition

(OCR) techniques to evaluate the clarity and perceptibility of

the information presented on the interface. For Phenomenon

No.11, Feature F

employs edge detection techniques to de-

termine whether an icon is clearly discernible. Lastly, we

extract relevant price information and calculate Features F

to represent Phenomenon No.12.

In total, we identiﬁed 19 features and will train a classiﬁ-

cation model based on them to detect ﬂeeceware. In the next

section, we will outline the process of extracting the values

of these features from the UI.

3.4 UI Collector

The features we focus on are mainly related to UI entities

and attributes that are displayed on the subscription UIs, e.g.,

the text contents and position of visual elements. These el-

ements are static after the UI has been fully loaded and do

not change dynamically during the user’s interaction with the

app. Therefore, we ﬁrst need to obtain the fully-loaded sub-

scription UIs. We utilize DroidBot [34], an automated tool

capable of capturing screenshots and layout information of

subscription UIs (including text, widget visibility, and lay-

out details) at runtime, to extract UIs from apps.

However, it is not easy to capture subscription UIs within

a limited time using DroidBot. This was due to the long

search path with multiple branches, resulting in a timeout.

To address this issue, we implement additional guided search

strategies in DroidBot based on our prior experience. These

strategies consider common patterns observed in subscrip-

tion apps and adjust the priority of interactive events accord-

ingly. (1) Handling introductory screens: In some apps,

users are initially presented with a series of screens highlight-

ing the app’s features. To progress to the next page leading

to the subscription UI, users need to click on buttons like

“continue” or “start”. We enable DroidBot to detect and in-

teract with these buttons ﬁrst. (2) Scrolling down the inter-

face: During the introductory screens or other app UIs, users

may need to scroll down to reveal the button or content neces-

sary to proceed. To ensure that all contents are loaded before

interacting, we prioritize the “ScrollEvent” in DroidBot’s in-

teractive events, allowing it to scroll down the page if neces-

sary. (3) Handling user input: Some app UIs require users

to ﬁll in or select speciﬁc information before proceeding to

the subscription UI. To handle this, we give higher priority to

the “SetTextEvent” that ﬁlls text boxes in DroidBot, which

ensures that DroidBot enters the required information before

attempting to interact with buttons. Additionally, we pre-set

ﬁelds such as nickname, birthday, email, age, etc., to facil-

itate input veriﬁcation. And we use pre-conﬁgured Google

accounts for requiring login. (4) Prioritizing relevant key-

words: To effectively trigger the subscription UIs, we add

relevant keywords to the “preferred lists” in DroidBot. These

keywords include terms like “subscribe”, “upgrade”, “VIP”,

“premium”. By prioritizing widgets associated with these

keywords, DroidBot can efﬁciently reach the subscription UI.

Finally, we utilize DroidBot on the crawled apps to generate

screenshots and layout ﬁles for further analysis.

DroidBot saves the information of all the UIs encountered

during the runtime. To improve efﬁciency, we use a keyword-

based method to pre-ﬁlter UIs that are unrelated to subscrip-

tions. Speciﬁcally, we collect all the texts in a UI, preprocess

the texts by converting all words to lowercase, expanding

abbreviations, removing non-ASCII characters and punctua-

tion (keeping the currency symbols), removing stop words,

and lemmatizing the words. Then, we ﬁlter out irrelevant

UIs by checking for the presence of subscription-related key-

words (e.g., “free”, “trial”, “$”, “month”, “subscription”),

which are collected by counting the word frequencies in the

subscription UIs previously gathered.

3.5 Feature Extractor

Each feature is composed of speciﬁc UI information, such as

the presence of certain subscription details, text length, font

size, etc. For example, to calculate F

, we need to locate

all subscription price information on the UI and extract the

length or font size of the information and the whole sentence.

However, accurately obtaining the information is difﬁcult

(challenge C2). The complexity primarily arises from two

key aspects: (1) Fragmentation of Information: For exam-

ple, in Figure 4, “Yearly” and “$29.99” are separated, and

only when presented together as “Yearly $29.99”, the mean-

ing (i.e., subscription charge information) becomes clear to

users. We need to design a method to effectively locate and

concatenate these discrete pieces of information from the en-

tire UI. (2) Variable Display Formats: For example, the

charge information can be presented as “6 Months: $59.99”

or “59.99 USD/6-Months”. The diversity of information ex-

pression forms and the lack of dataset make it challenging for

machine learning-based natural language processing meth-

ods to accurately differentiate between different subscription

information and extract the values from them.

Figure 4: An example of neighboring widgets. w

, w

are

neighboring widgets, and they should be considered together.

Addressing the Fragmentation of Information: We design

a novel layout-based approach to link related subscription

information. Our idea is based on an observation that in a

layout ﬁle, each piece of information is deﬁned in a widget

(a UI component that draws what users can see or interact

with), and all the related information is deﬁned in neighbor-

ing widgets. These widgets are descendants of a parent wid-

get whose widget class is “ViewGroup”, “RelativeLayout”,

“LinearLayout”, etc. For example, in Figure 4, the three wid-

gets containing charge information are neighboring widgets,

which are children of a “ViewGroup” widget (i.e., the green

box). Therefore, by searching for all the neighboring widgets

of each widget, we can link all the related information.

Addressing Variable Display Formats: We adopt a multi-

rule-based approach to extract target information, avoiding

the potential semantic misunderstandings that could arise

from machine learning approaches. This idea comes from

the analysis of all the texts about subscription information

from 145 subscription UIs obtained from the 79 ﬂeece-

ware. We discover that the developer usually uses certain

ﬁxed keywords or symbols when describing each type of

subscription information, and the numerical values appear

in speciﬁc positions within the information. For example,

the price and billing cycle information always appear to-

gether, within which, there will always be numerical values,

currency symbols (e.g., “$”, “USD”), and time units (e.g.,

“week”, “month”). These elements appear in a speciﬁc or-

der. Therefore, we construct a collection of regular expres-

sions (available on our website[44]) to extract subscription

details expressed in various forms. We detail how to gener-

ate the regular expressions and how to use the layout-based

approach and the regular expressions to locate and extract

target subscription information (including textual and visual

information) as follows.

Layout-based textual information extraction. We ﬁrst

generate regular expressions using Regex Generator++ [6, 7]

(an automated tool that creates text extraction patterns from

given examples) as the initial reference. We manually ﬁne-

tune and expand these regular expressions based on speciﬁc

cases encountered during the analysis. To extract target in-

formation, we need to combine all the related subscription

information in a UI. For each widget, we ﬁnd its closest par-

ent widget with the target class, i.e., if a widget’s parent w

does not fall into the speciﬁed class, we continue searching

for the parent of w

. Once the parent widget is found, all the

descendant widgets are considered as neighboring widgets.

We link all the texts of neighboring widgets together and ap-

ply the regular expressions to get the target information.

However, directly applying the regular expressions to the

combined texts may lead to many mismatches. This is be-

cause irrelevant information can add noise to the original

texts and impact the accuracy of extraction. Moreover, when

multiple useful pieces of information appear together, it may

cause either information loss or incorrect information com-

bination. To prevent mismatches, we consider performing

real-time target information extraction during the process of

linking the texts of neighboring widgets. Speciﬁcally, for a

widget, if we can not extract any useful information from its

text using the regular expressions, we search for its neighbor-

ing widgets and link their texts according to the layout, prior-

itizing widgets with smaller view IDs. The view ID uniquely

identiﬁes a widget and can be used to locate the widget. The

widget with a small view ID comes ﬁrst in a UI. We link the

text in widgets in sequence, one by one. Assuming the pre-

viously concatenated text is t, then for a new neighboring

widget, if its text does not contain any relevant information

(i.e., numbers, words, and symbols mentioned in our regular

expressions), we skip it. In contrast, if it matches our regu-

lar expressions, we extract the useful information, record its

view ID, and link the remaining text to t. Once t matches

the regular expressions, we extract the matched information

and keep the remaining texts as t

′

. By doing so, we can accu-

rately obtain all the necessary textual information and their

location indicated by view IDs.

Visual information extraction. The proposed features in-

clude not only textual information but also visual informa-

tion about the UI elements, such as font size, font style, and

visibility. With the help of DroidBot, we managed to ob-

tain information about widgets on the interface, such as the

position of each widget containing subscription information

according to the boundary ﬁeld in the layout ﬁle indicated

by view IDs, as well as the size of the widget view. However,

we cannot directly obtain font-related information. To obtain

the visual information of target information, we ﬁrst retrieve

the screenshots of related widgets and then utilize Tesseract-

OCR [51], which can output the font size, style, and color of

each word identiﬁed.

Since the text of a widget may contain more than just the

target information, we need to match the target information

with the strings recognized by OCR to obtain the desired vi-

sual information. Because OCR recognition can be inaccu-

rate and does not always align with human perception, we op-

timize the matching process using the Levenshtein distance

and Levenshtein ratio [63] to improve fault tolerance. The

Levenshtein distance measures the minimum number of edit

operations required to convert one string into the other, and

the Levenshtein ratio is calculated by

sum−ldist

sum

, where sum is

the total length of the two strings and ldist is the Levenshtein

distance between them. If the Levenshtein distance or Lev-

enshtein ratio is below (or above) a predetermined threshold,

we consider the strings to be a match and the target informa-

tion to be clearly visible, and we can get the font size and

style from the outputs of the OCR. If no strings match, we

consider the information invisible.

For the visibility of a target icon, we apply the Canny

Edge Detection technique [5, 22] to detect the icon in the

designated area. If no edges are detected, the icon is con-

sidered invisible. Speciﬁcally, we identify the widget that

represents the target icon by searching for keywords (e.g.,

“close”, “exit”) in the resource id ﬁeld of each widget and get

its screenshot. We then apply a multi-scale template match-

ing method [31, 54] to recognize the icon. This method is

based on the observation that target icons have several ﬁxed

styles, such as the shape of a cross for dismissing icons. The

method is applied as follows: We ﬁrst extract several icons

with typical shapes as templates, and then slide the templates

of different scales on the screenshots processed by the Canny

operator and calculate the matching scores. Finally, we col-

lect the icon detection results based on empirical thresholds

for each template and vote for the ﬁnal decision. If no tem-

plates match, the icon is regarded as invisible.

3.6 Detector

To account for which kind of UI may deceive users and make

them overlook or misunderstand certain subscription infor-

mation, a single feature may not be sufﬁcient. We utilize

machine learning techniques to determine the combinations

of features and their respective weights that are indicative

of problematic UIs. Considering the limited availability of

training data and the fact that many features in our dataset are

boolean values, we opted for a shallow Decision Tree Classi-

ﬁer model [9] to ensure accuracy while avoiding overﬁtting.

To facilitate users’ better understanding of detected issues on

the UI, we need to provide explanations for the detection re-

sults. Although the Decision Tree inherently offers a certain

level of interpretability, such interpretability might not be in-

tuitive enough for ordinary users to understand the detection

results, We integrated an additional step to address this is-

sue. We employed SHAP (Shapley Additive exPlanations)

[37, 38], an XAI technique, to visualize the most signiﬁcant

features contributing to the prediction results for each data

sample. The integration of the XAI technique enables users

to intuitively grasp the rationale behind individual classiﬁca-

tion results.

To create a reliable training dataset, we engaged ten ex-

perts, as detailed in Section 3.3, for three additional weeks

to manually label the samples, dedicating around three hours

each week to meticulous annotation. Additionally, these ex-

perts participated in an hour-long discussion weekly to en-

sure a consistent and accurate labeling process. The experts

were presented with a set of 136 subscription UIs (crawled

from the 45 Android ﬂeeceware apps and another 45 sub-

scription apps from Google Play), and their task was to in-

dependently assess whether each UI was a suspected ﬂeece-

ware UI. If a UI was deemed to be suspected, the experts

Figure 5: An example of the alert and explanation.

were instructed to mark the speciﬁc dark patterns that con-

tributed to this assessment for the convenience of further dis-

cussion. An example of what experts need to label is shown

in Figure 9 in the Appendix B. In cases where there were dis-

crepancies, the experts engaged in peer discussions to reach

a consensus. To this end, we got a collection of 76 suspected

ﬂeeceware UIs and 60 benign UIs. We then extracted UI

features to form input feature vectors for training.

Using the labeled instances, we utilize default parameters

provided by scikit-learn [45] to train a decision tree classiﬁer.

To help users understand the result, we utilize the SHAP tech-

nique, a game theoretic method for explaining machine learn-

ing outputs, to identify the primary features that contribute to

the prediction outcome and visualize the explanations. Lever-

aging these insights, we offer alerts to users. The alert pro-

vided by DARKFLEECE for each identiﬁed suspected ﬂeece-

ware UI highlights the speciﬁc elements or content on the UI

that require attention. Figure 5 shows an example of the alert

and an explanation of the identiﬁcation. The visualization

shows an area on the subscription UI where the user is only

informed about the availability of a free trial, without being

notiﬁed that the trial will automatically convert to a recurring

charge after the trial period ends. Furthermore, the informa-

tion about the fees is concealed in a lengthy paragraph. This

alert assists users in circumventing the overlooking or misin-

terpretation of crucial subscription details.

3.7 Evaluator

Due to the lack of a clear deﬁnition for ﬂeeceware and the

variations in how different people perceive the severity of

subscription issues, we aim to continuously adjust the set of

subscription UI features based on diverse security concerns.

On one hand, app markets and advanced users can run Dark-

Fleece locally to detect ﬂeeceware within apps. They can an-

alyze their detection results according to their own security

needs and adjust the required features accordingly, such as

adjusting PR_MAX in F

. On the other hand, DarkFleece

can be run by third parties, where users upload apps and

receive the results. Users can provide feedback, based on

which, third parties can adjust the features most relevant and

likely to cause issues for the users. With the support of the

evaluator, DARKFLEECE can be personalized and continu-

ously enhance its capabilities.

4 Evaluation and Findings

4.1 Setting

Between 2021 and 2023, we crawled real-world apps from

Google Play and obtained 13,597 unique apps after remov-

ing duplicates based on their MD5 checksums. These apps

span across all 33 categories on Google Play. To perform

analysis on these apps, we utilized 10 workstations, each

equipped with 4 cores with a 1.80 GHz CPU, 32.0GB mem-

ory, and 1TB hard drive. We set up a virtual machine on each

workstation, with the virtual machine running Google Pixel4,

featuring 2 cores, 2GB memory, and 16GB hard drives. The

Android version used was 11.0.

4.2 Performance Evaluation

Effectiveness. We evaluate the effectiveness of DARK-

FLEECE based on the accuracy of extracting subscription UIs

and identifying suspected ﬂeeceware. For the ﬁrst one, we

randomly selected 100 apps that were successfully executed

by DroidBot and collected 1,148 UIs, out of which 135 were

subscription UIs after a manual check. Using our feature ex-

tractor, DARKFLEECE successfully extracted 140 subscrip-

tion UIs, including all the 135 UIs above. Therefore, the

accuracy of extracting subscription UIs is 99.57%.

For the second one, we re-selected UIs for evaluation con-

sidering the balance of the samples. Based on the results

of DARKFLEECE, we selected 81 (10% of the total sub-

scription UIs we collected) ﬂeeceware UI samples and 81

benign samples. Considering the similarity of subscription

UIs, we speciﬁcally sourced UIs from a diverse range of

app categories to guarantee distinct UI designs. Addition-

ally, we manually removed duplicate UIs to ensure an unbi-

ased performance evaluation. We then asked the ten experts

to label the samples and compared their results to assess the

performance. Our analysis showed that 144 subscription in-

terfaces are correctly labeled by DARKFLEECE, while 10

are mislabeled, including 7 false positives and 3 false neg-

atives. Therefore, the accuracy of identifying ﬂeeceware UIs

is 93.83%. Overall, DARKFLEECE can achieve an accuracy

of 93.43% (99.57% × 93.83%) for detecting ﬂeeceware UIs.

We analyzed the reasons behind the errors in these two

processes. The ﬁrst reason is that developers don’t follow

traditional coding practice, such as not using the Button class

to represent buttons, not using the TextView class to repre-

sent text, not using the RelativeLayout class to hierarchically

structure the UI, as a result, the information we extract is

inaccurate. The second reason is that there are advertising

contents in the UI, which disrupts the target information.

Runtime Performance.

On average, D

ARK

LEECE

re-

quires approximately 10.12 minutes per app for processing,

which includes UI collection, UI pre-ﬁltering, feature extrac-

tion, and ﬂeeceware detection. Although collecting the UIs

of an app could take up to 10 minutes, the subsequent proce-

dures take less time, averaging 7.17 seconds per app. UI pre-

ﬁltering takes about 3.29 seconds per app (37,346.94 seconds

for 13,597 apps), while feature extraction and ﬂeeceware de-

tection take an average of 3.88 seconds per app (4,373.03

seconds for 1,128 apps).

Users Study. To demonstrate the effectiveness of DARK-

FLEECE in helping app users pay attention to problems on

subscription UIs and avoid mistakes, we conducted a survey

study by recruiting participants from universities and col-

lected 37 valid responses. Participants in the survey were

presented with ten UIs from various subscription apps and

asked to make an initial judgment regarding the UI’s poten-

tial deceptiveness, without knowing the ground truth. Fol-

lowing their initial decisions, we asked them to review our

detection results. The survey questions can be accessed on

our website[44].

Speciﬁcally, we asked participants to identify suspicious

UI issues and rate their conﬁdence levels. And they were re-

quired to provide reasons for each selected UI. Subsequently,

we provided them with explanations generated by DARK-

FLEECE, and asked them to re-select ﬂeeceware UIs and

rate their conﬁdence levels again. 34 participants were un-

dergraduate or graduate students, with one-third of them ma-

joring in non-computer-related disciplines. The other 3 par-

ticipants are family members, all aged 50 or above. Our

study ﬁnds that 30 participants (81.08%) have encountered

subscription problems before, and our explanations signiﬁ-

cantly improve users’ ability to notice issues on ﬂeeceware

UIs. Speciﬁcally, our explanations on the 10 provided sam-

ples all draw more attention to the issues on the UIs, with an

average increase of 5 participants noticing issues on the ten

UIs provided. The most successful explanation resulted in 9

more (from 13 to 22) participants noticing the issues on the

UI. All participants also reported higher conﬁdence levels af-

ter receiving the explanations.

In addition, we observed that participants without a back-

ground in computer science and the three older individuals

were more susceptible to ﬂeeceware. Before we presented

our results, these participants struggled to identify issues on

the UI. Even when the UI employed only a few simple tricks,

they were prone to overlooking crucial information. This

was likely due to their limited familiarity with subscriptions

and relatively less experience using mobile apps. This ﬁnd-

ing further supports that the features we proposed indeed con-

tribute to potential harm. Moreover, through this user study,

we discovered that the constructed features effectively cov-

ered the issues reported by participants, highlighting their

comprehensiveness.

Assessing Flexibility in a Use Case. As outlined in Sec-

tion 3.5, we recognize users’ diverse perceptions of subscrip-

tion issues and their differing tolerance levels towards ﬂeece-

ware. Consequently, we equip users with the capability to

modify the detection rate by customizing the features uti-

lized in DARKFLEECE. We tested the ﬂexibility by evalu-

ating a use case. This use case addresses speciﬁc ﬂeeceware

concerns, particularly those involving the UI that emphasizes

free usage while hiding pricing information, as well as cases

where the claimed trial period is different from the actual

billing time. With this speciﬁc concern, we re-evaluated and

selected features from Table 6. We then re-annotated 100

subscription user interfaces based on these concerns. The

features chosen were F

and F

through F

. Out of these

100 samples, 67 were marked as “suspected”, and 33 as “be-

nign”. DARKFLEECE accurately detected 93 of these sam-

ples, but misclassiﬁed 7, comprising 5 false positives and 2

false negatives. It shows that the accuracy rate for identify-

ing ﬂeeceware user interfaces remained at 93.00%, aligning

closely with our earlier results.

4.3 Findings in the Wild

Landscape. We analyzed 13,597 apps across all 33 Google

Play categories, which were collected based on their popular-

ity

. Surprisingly, out of the 589 apps with subscription UIs

(813 in total), DARKFLEECE detected 443 (75.21%) apps as

suspected ﬂeeceware. These suspected apps have been down-

loaded over 5 billion times collectively. Moreover, each sus-

pected ﬂeeceware app had more than one ﬂeeceware UI on

average (629 UIs in 443 apps). Notably, some of the most

popular apps, ranked in the top-200 list, were also identiﬁed

as suspected ﬂeeceware. In July 2022, we assessed the preva-

lence of ﬂeeceware in popular apps ranked in the top 200 of

the free, best-seller, and popularity lists. According to our

ﬁndings, there were 7 suspected apps in the free list, 13 in

the bestseller list, and 5 in the popularity list.

We also observed suspected ﬂeeceware apps across all

categories of apps. Table 5 in the Appendix A presents

the distribution of suspected apps across various categories.

Our investigation revealed that the issue was particularly

widespread in the “Photography” and “Entertainment” cat-

egories, where the suspected ﬂeeceware apps accounted for

almost a quarter of all suspected cases. We report our ﬁnd-

ings to Google to get their recognition through their ofﬁcial

feedback website

We used the website https://app.diandian.com/rank/googleplay for efﬁ-

cient categorizing and ranking.

We reported through https://support.google.com/googleplay/android-

developer/contact/policy_violation_report. Within the form, we categorized

these apps under the “monetization and ads/subscription” category and spec-

iﬁed their policy violation.

Figure 6: A case where two apps developed by the same

developer were found to contain similar ﬂeeceware UIs.

Common Strategies in Fleeceware UIs. Based on the inter-

pretability outputs of DARKFLEECE, which indicate the fea-

tures that contribute to UIs being classiﬁed as ﬂeeceware UIs,

we summarized the common strategies used in these UIs.

The most common strategy observed is the lack of explicit

indication on buttons about the ongoing subscription action

(Phenomenon No.9, 270 in 629 suspected UIs, 42.93%),

while emphasizing the free trial on the interface, e.g., not

informing the user that the free trial will automatically tran-

sition into a paid subscription after the trial (Phenomenon

No.5, 368 UIs, 58.51%), using a light font color and a small

font size for the billing information (Phenomenon No.10,

182 UIs, 28.93%). In addition, 332 UIs (52.78%) do not

clearly indicate whether the subscription will automatically

renew (Phenomenon No.4). 111 UIs (17.65%) mislead users

regarding the cancellation period (Phenomenon No.8). They

claim to offer a 3-day free trial but require users to cancel

one day in advance, otherwise, it will automatically convert

to a paid subscription on the third day.

Fleeceware Developers. After collecting the names and con-

tact details of developers of suspected ﬂeeceware, we investi-

gated them and found that apps developed by the same devel-

oper often have similar UIs with the same issues. 19 develop-

ers are found engaged in such practices. Figure 6 shows that

the UIs of these two apps are almost the same and only em-

phasize the “free trial” option. Moreover, some developers

are found to try to make more money by using different App

IDs and names for nearly identical apps. For instance, apps

named Bravo cleaner and Bravo Security are developed by

the same developer, and have almost the same functionality

and UIs. We also found that 12 suspected apps had no valid

contact information or website for their developers. We sug-

gest creating a blacklist for these developers and scrutinizing

their apps more closely.

Fleeceware Evolution. We conducted a follow-up investi-

gation to gain a better understanding of the evolution of sus-

pected ﬂeeceware over time. We selected 100 subscription

Figure 7: A case that a UI becomes problematic from 2021

to 2022. The information “1 Month: $19.99” was changed

to “Free for new Users”. After the trial, it automatically

switches to an annual subscription. But this information has

a small font and a light color, making it easy for users to

overlook.

apps (75 suspected and 25 benign) that were collected in

June 2021. We downloaded their versions in August 2022

and their latest versions in June 2023. In 2022, 35 suspected

ﬂeeceware apps were taken down. Out of the 60 apps we

were able to collect (39 suspected and 21 benign), only 2 sus-

pected apps improved their subscription UIs and were classi-

ﬁed as benign. However, we also identiﬁed 1 new suspected

ﬂeeceware app, as shown in Figure 7, where the price infor-

mation of “1 Month: $19.99” was changed to “Free for new

users”, potentially misleading users into believing the app is

free, and they may unknowingly be charged for a yearly sub-

scription after the free period. In 2023, another 12 suspected

ﬂeeceware apps were taken down. Among the remaining 48

apps (24 suspected and 24 benign), 4 previously suspected

apps became benign, but we also identiﬁed 1 new suspected

app. Among the 4 apps transitioning to a benign model, we

make a surprising discovery that an app called PictureThis -

Plant Identiﬁer adds a reminder feature, which notiﬁes users

before the trial period ends to remind them about the upcom-

ing subscription fee. In summary, these ﬁndings demonstrate

that app markets are making efforts to address the subscrip-

tion issues, however, the continued emergence of suspected

ﬂeeceware suggests that developers are still motivated by the

ﬁnancial gains associated with ﬂeeceware practices.

Generalizability for non-English apps. DARKFLEECE is

built on UI features, allowing it to be applied to apps in dif-

ferent languages. However, adjusting keywords and regular

expressions for speciﬁc languages is necessary when extract-

ing feature values. On 25 subscription apps in Chinese and

25 in German crawled on September 6th, 2023, we identiﬁed

10 and 14 suspected ﬂeeceware separately.

User Perceptions. User reviews are a valuable source for

analyzing app issues, where we uncover numerous interest-

ing subscription problems. Many users reported their chil-

dren subscribing to an app without their consent or knowl-

edge. Some users believed that the app would not automat-

ically renew their subscription or deduct money if they did

Figure 8: The ﬂeeceware UI detected in YouTube Music. The

button only claims “FREE TRIAL”, and the UI states a 1-

month free trial at the top, but users reported being charged

3 days later.

not have sufﬁcient funds or uninstalled the app. Addition-

ally, many users admitted to forgetting to cancel their sub-

scriptions. These ﬁndings suggest that users are vulnerable

to the threat of ﬂeeceware due to their lack of awareness and

caution when it comes to detecting and avoiding such scams.

We utilized user reviews as an indirect method of validat-

ing our experimental results. Speciﬁcally, we sampled the de-

tected results and thoroughly examined all associated user re-

views. Whenever we encountered user complaints about de-

ceptive UI design within these reviews, we cross-referenced

them with our detection results and explanations to deter-

mine whether our model had accounted for the issues high-

lighted by users. For example, the widely used application

YouTube Music, which has been downloaded over 1 billion

times, was detected ﬂeeceware subscription UI as shown in

Figure 8, and we have found user complaints related to the

issue under the reviews of this application on Google Play

(e.g., one said “Signed up for month free trial but google

charged me for premium 3 days later”), indicating the po-

tential risks. However, due to some newly published apps

lacking reviews and many users not accurately or comprehen-

sively expressing themselves when writing reviews, reviews

cannot entirely reﬂect the issues existing in apps. Therefore,

we primarily used reviews as a reference for validating the

completeness of our features.

5 Lessons

The results of our investigation offer valuable lessons for app

users, ethical developers, and app markets on how to avoid

ﬂeeceware issues and minimize their impact.

5.1 Ethical Developers

Some ethical developers may unintentionally introduce fea-

tures that resemble ﬂeeceware during development. They

may not be aware that certain design choices could have an

impact on users. Therefore, for these ethical developers, we

provide some development recommendations to help them

be more mindful. When designing subscription interfaces,

developers should prioritize ethical practices by making user-

friendliness a key consideration. This includes presenting

subscription-related information clearly, accurately, and con-

cisely in the UI. Important design considerations may in-

clude: (1) Present subscription information in simple and

concise sentences. This may include stating the subscrip-

tion plan and the cancellation deadline, such as “$9.99/week”

and “cancel before the current period ends”. Providing a

link or brief description of where users can unsubscribe (e.g.,

“Google Play - Settings - Subscriptions”) and clearly indi-

cating if the subscription will automatically renew, using

phrases like “auto-renewed” or “no automatic renewal”, can

also be helpful. (2) Highlight important information, such as

current charge information, with a larger font size to make

it more visible. (3) Place useful information near or on

the interactive buttons, like “subscribe” and “continue”, so

that users can easily access the information before proceed-

ing. (4) Clearly display the dismiss icon in a familiar lo-

cation, such as the upper-left or upper-right corner, so that

users can easily close the UI if they choose not to subscribe.

(5) Incorporate notiﬁcation features to alert users of criti-

cal events, such as subscription conﬁrmations, cancellations,

or approaching deadlines. These notiﬁcations can be in the

form of pop-up messages or emails, and they can help users

stay informed and avoid unexpected charges or renewals. Ad-

ditionally, a simple test (such as a math question) before

conﬁrming a subscription can prevent accidental sign-ups

by children without their guardians’ consent. Another use-

ful suggestion is to use various widget classes in a standard-

ized way when designing UI interfaces (especially in XML

ﬁles). For example, use the “Button” class for buttons and

the “TextView” class for text, which can greatly improve the

efﬁciency and accuracy of software testing.

5.2 App Users

Lack of awareness and caution make many users vulnera-

ble to the threat of ﬂeeceware. To address this, we have

developed a user manual to help users better understand

subscriptions[44]. The manual covers crucial information

such as what subscriptions are, what to pay attention to, how

to cancel a subscription, and how to avoid common scams.

For complete details, please visit our website. Some of the

suggestions include: (1) Be patient with subscription pop-

ups. Some apps will block users from accessing their ser-

vices by displaying subscription pop-ups. If users really need

the app, read the subscription terms carefully to avoid sub-

scribing by mistake. (2) Be cautious when navigating sub-

scription UIs to avoid falling into traps. Some apps delib-

erately hide subscription terms or make false claims about

being free. Be careful before clicking any buttons in a UI,

especially when seeing phrases like “use for free” or “free

trial”. (3) Pay attention to essential subscription information,

such as the price, billing frequency, trial period, cancellation

deadline, and auto-renewal terms when reading a subscrip-

tion UI. Do not subscribe if this information cannot be found

or understood easily. (4) Understand the right way to cancel

a subscription. Uninstalling an app does not mean canceling

the subscription. The common ways to cancel subscriptions

are given in the user manual. Users can familiarize them-

selves with these cancellation ways before subscribing. (5)

provide feedback to the app developers or app market man-

agers as soon as possible if you encounter any problems.

5.3 App Markets

As a centralized platform for apps, the app market should

strive to provide better services and enforce stricter manage-

ment of the apps displayed on it to beneﬁt users. (1) A more

user-friendly way to manage subscriptions can be proposed

to improve the user experience by providing. The informa-

tion could be demonstrated on the download pages of all sub-

scription apps to make it easy for users to ﬁnd. (2) Establish

an effective feedback channel for users to report issues about

ﬂeeceware and anything else. (3) Using user reviews as a

source for regulating apps. In our research, we ﬁnd that user

reviews can reveal many issues with an app. These reviews

not only help developers understand areas for improvement

but also enable platforms to analyze and identify apps that

pose potential threats. Platforms can then conduct further

testing and monitoring of such apps to ensure compliance

and user safety. (4) Considering the use of a wider range of

tools to detect potential risks in the app market. The issues

present in app marketplaces encompass various behaviors

that can impact users. Market managers should consider em-

ploying different types of tools to discover and address these

issues. Our tool can also help to detect suspected ﬂeeceware

in a cost-effective way. (5) Consider implementing a reputa-

tion credit system for developers and penalizing those who

deliver ﬂeeceware and other malicious apps.

6 Discussion

Deployment: DARKFLEECE currently operates as an

ofﬂine-analysis tool that analyzes application packages and

provides detection results before user interaction with the

app, rather than during usage. Therefore, DARKFLEECE can

be primarily used for ofﬂine analysis. App users can utilize it

locally to detect ﬂeeceware designs within apps before usage,

averting potential deception. Benign app developers can use

it to scan their apps before deployment to preemptively ad-

dress any unintentional ﬂeeceware issues. App markets can

leverage it for large-scale scanning. Third-party app evalu-

ators can also employ it to evaluate apps and share results

with others.

DARKFLEECE can enhance detection performance and ef-

ﬁciency based on external information. For example, it can

adjust detection strategies based on user feedback to achieve

more customized and accurate detection. Furthermore, user

reviews can raise alerts about potential ﬂeeceware problems

in the future, helping to reﬁne detection targets and improve

detection efﬁciency. For instance, when relevant user com-

plaints are detected or formal reports are submitted, app mar-

kets can use DARKFLEECE to investigate potential ﬂeece-

ware issues.

“Overcharge” Attribution Detection. Quantifying over-

charge is subjective, any charge made without the end-user’s

knowledge constitutes an overcharge, so we emphasize iden-

tifying ﬂeeceware’s deceptive characteristics through dark

patterns. In addition, we set PR_MAX in Feature F

ad-

justable. End users can set this value based on their accept-

able price range, and market regulators can set more appro-

priate values based on market statistics to achieve a more

accurate estimation of the overcharge.

UI Static Analysis. The ﬂeeceware features we focused on

covered all types of dark patterns and platform requirements

except for those related to dynamic interactions (i.e., the cat-

egory nagging and requirement R4). Dynamic interactions

were not considered for three reasons. Firstly, most subscrip-

tion dark patterns and requirements for UI development are

about static elements and layouts on the UIs. Secondly, the

dark patterns related to interaction, such as subscription ads

popping up repeatedly, may primarily affect the user experi-

ence when using the app rather than causing ﬁnancial loss.

Additionally, static analysis of subscription UIs can be more

widely applicable as it can be conveniently used in any app,

as long as the subscription UI can be obtained. However, sub-

scription UI analysis may be affected by irrelevant elements

on the UIs, such as in-app purchase information and pop-ups,

which may cause D ARKFLEECE to mistakenly analyze them

as subscription UIs, resulting in false positives. We can fur-

ther observe the features of these UIs and design methods to

ﬁlter them out.

Features Extraction. DARKFLEECE extracts UI features

by utilizing UI structure information, which may not be able

to handle non-standardized development practices, such as

the misuse of tags (e.g., misusing“TextView” and “Button”),

displaying text within images, customizing view positions,

complex interface hierarchies, and arbitrary naming of wid-

gets. Further research could minimize dependence on the

structure information by directly employing image process-

ing techniques to extract and locate relevant elements within

the subscription UI. This also enhances the robustness of our

method. If attackers attempt to inﬂuence our extraction by al-

tering the UI displayed to users, users may observe the issue

and avoid being deceived.

In addition, regular expressions are utilized to extract im-

portant textual information. However, the vast number of

apps in the market and the various expressions of informa-

tion make it difﬁcult to create regular expressions that cover

all cases. To address this challenge and improve efﬁciency

and comprehensiveness, an AI-based regular expression gen-

erator could be incorporated. Moreover, we used OCR tech-

niques to extract visual information but encountered limita-

tions that caused some elements to be inaccurately identiﬁed.

Future research could leverage advanced computer vision

techniques to obtain more precise information.

7 Related Work

Existing Android malware detection technologies mainly in-

clude static detection and dynamic detection. Static analy-

sis performs feature extraction by disassembling source code

to detect suspicious code without running the application

[1, 2, 24, 32, 39], but cannot solve code obfuscation and

dynamic code loading [3, 56]. Dynamic analysis can deal

with code obfuscation by checking the characteristics of sus-

picious Android applications at runtime [13], but it will con-

sume more resources and storage space [48], and malware

can use anti-simulation methods to evade dynamic analysis

[23, 57]. Recent deep-learning-based techniques can better

identify unknown malware, but the effect depends on the al-

gorithm design and dataset [59]. However, the existing tech-

niques cannot detect ﬂeeceware, because most ﬂeeceware

does not contain malicious code and exhibit coding patterns

that closely resemble legitimate apps, without resorting to

traditional malicious behaviors, such as stealing conﬁdential

data and causing system crashes.

Analyzing UIs can provide valuable insights for develop-

ers and designers to create effective UI design styles, accel-

erating the process of building UIs for their apps [8, 15, 18].

Moreover, UI analysis can reveal issues within an app,

such as rendering problems and inefﬁcient image display.

DRAW [ 26] conducts UI rendering analysis to help develop-

ers identify and resolve short delays, while TAPIR [33] aims

to detect inefﬁcient image displays in mobile apps. Owl-

Eye [36] can identify display issues, such as text overlap

and missing images, and locate the speciﬁc region of the

problem within a UI. Automated UI testing, a crucial as-

pect of UI analysis, involves dynamically exploring the user

interfaces of an application to obtain relevant information.

Various tools have been developed to facilitate this, such

as Monkey [53], which generates a series of random oper-

ations (e.g., clicking, swiping) to execute an app automati-

cally, and DroidBot [34], which uses a model-based strategy

to enhance exploration efﬁciency. However, these studies

and tools are mainly focused on functional testing and gen-

eral display issues and are not adequate for detecting prob-

lematic subscription UIs. In our work, we utilize and modify

DroidBot’s search strategy to better detect subscription UIs.

Dark patterns are UI designs that deceive users into mak-

ing decisions that do not align with their best interests. Re-

cent research has focused on the negative impact of dark pat-

terns and proposed solutions [10, 16, 19, 21, 29, 40, 42].

Gray et al. [29] categorize dark patterns into nagging, ob-

struction, sneaking, interface interferences, and forced ac-

tion. Meanwhile, Narayanan et al. reviewed the history

and ethical implications of dark patterns [42]. Additionally,

Yada et al. [62] constructed a dataset for dark pattern detec-

tion, which comprised 1,818 dark pattern texts from shop-

ping sites, and machine learning methods were applied to

detect them. Furthermore, a study by Geronimo et al. [21] ex-

plored the prevalence and impact of dark patterns in mobile

applications and suggested recommendations for designers

and developers to create more ethical and user-friendly UIs

that avoid the use of dark patterns. Chen et al. [16] uses gen-

eral attributes of UI elements such as hierarchy, color, and

font size to detect common UI dark patterns like layer over-

lays, pre-selected options, and hidden text. However, due to

the unique deceptive practices of ﬂeeceware, such as varying

UI layouts and confusing textual expressions that convey am-

biguous or contradictory semantics, this proposed approach

lacks semantic analysis of interface information and is not ef-

fective for automatically detecting dark patterns on subscrip-

tion UIs. We address this issue by extracting features related

to speciﬁc subscription information and analyzing the over-

all accuracy and reasonableness of subscription information

expression.

8 Conclusion

This work is dedicated to investigating ﬂeeceware at scale

through the development of an automatic ﬂeeceware detec-

tion system called D ARKFLEECE. To identify detective fea-

tures for ﬂeeceware, we construct a feature library based on

expert knowledge drawing from observations of user inter-

actions with ﬂeeceware behaviors, a review of ﬂeeceware

samples, and insights from dark pattern studies and platform-

speciﬁc UI design requirements. Then a novel layout-based

information linking technique and a multi-rule-based infor-

mation extraction method are designed to harvest subscrip-

tion information, which is then converted to UI features. And

a classiﬁer is ﬁnally applied to detect whether the UI is sus-

pected. With an accuracy of 93.43%, DARKFLEECE accu-

rately identiﬁes suspected ﬂeeceware UIs and provides eas-

ily understandable alerts to users to comprehend the potential

risks of ﬂeeceware. We also ran DARKFLEECE in the wild

to investigate the landscape, app developers, evolution, and

user perception of ﬂeeceware. Our ﬁndings offer valuable

insights for app users, developers, and app market managers

on how to prevent such problems. We have reported our ﬁnd-

ings to Google to get their recognition.

Acknowledgements

The IIE authors are supported in part by NSFC (92270204)

and Youth Innovation Promotion Association CAS. And

the HYU author was supported by the Institute of Infor-

mation & communications Technology Planning & Evalu-

ation (IITP) grant (No. RS-2022-00155885, Artiﬁcial In-

telligence Convergence Innovation Human Resources De-

velopment (Hanyang University ERICA)) and the Na-

tional Research Foundation of Korea (NRF) grant (NRF-

2022R1F1A1074999), both funded by the Korea govern-

ment (MSIT).

References

[1] Yousra Aafer, Wenliang Du, and Heng Yin. Droidapiminer:

Mining api-level features for robust malware detection in an-

droid. In Security and Privacy in Communication Networks:

9th International ICST Conference, SecureComm 2013, Syd-

ney, NSW, Australia, September 25-28, 2013, Revised Selected

Papers 9, pages 86–103. Springer, 2013.

[2] Steven Arzt, Siegfried Rasthofer, Christian Fritz, Eric Bod-

den, Alexandre Bartel, Jacques Klein, Yves Le Traon, Damien

Octeau, and Patrick McDaniel. Flowdroid: Precise context,

ﬂow, ﬁeld, object-sensitive and lifecycle-aware taint analysis

for android apps. Acm Sigplan Notices, 49(6):259–269, 2014.

[3] Ömer Aslan Aslan and Reﬁk Samet. A comprehensive re-

view on malware detection approaches. IEEE Access, 8:6249–

6271, 2020.

[4] Avast. Lists of ﬂeeceware apps.

https://github.com/ava

st/ioc/tree/master/Fleeceware

, 2021.

[5] Paul Bao, Lei Zhang, and Xiaolin Wu. Canny edge de-

tection enhancement by scale multiplication. IEEE TPAMI,

27(9):1485–1490, 2005.

[6] Alberto Bartoli, Andrea De Lorenzo, Eric Medvet, and Fabi-

ano Tarlao. Inference of regular expressions for text extraction

from examples. IEEE Transactions on Knowledge and Data

Engineering, 28(5):1217–1230, 2016.

[7] Alberto Bartoli, Andrea De Lorenzo, Eric Medvet, and Fabi-

ano Tarlao. Learning text patterns using separate-and-conquer

genetic programming. In EuroGP, 2015.

[8] Farnaz Behrang, Steven P Reiss, and Alessandro Orso.

Guifetch: supporting app design and development through gui

search. In MOBILESoft’ 18, pages 236–246, 2018.

[9] L. Breiman, J. H. Friedman, R. A. Olshen, and C. J.

Stone. Classiﬁcation and regression trees (cart). Biometrics,

40(3):358, 1984.

[10] Harry Brignull. Deceptive design.

https://www.deceptiv

e.design/

, 2022.

[11] Iker Burguera, Urko Zurutuza, and Simin Nadjm-Tehrani.

Crowdroid: behavior-based malware detection system for an-

droid. In SPSM ’11, pages 15–26, 2011.

[12] BusinessofApps. Global consumer spending in subscription

apps reached $17.1 billion in 2022.

https://www.busine

ssofapps.com/data/app-revenues/

, 2022.

[13] Haipeng Cai, Na Meng, Barbara Ryder, and Daphne Yao.

Droidcat: Effective android malware detection and categoriza-

tion via app-level proﬁling. IEEE Transactions on Informa-

tion Forensics and Security, 14(6):1455–1470, 2018.

[14] Jagadeesh Chandraiah. Fleeceware apps overcharge users for

basic app functionality.

https://news.sophos.com/en-u

s/2019/09/25/fleeceware-apps-overcharge-users-f

or-basic-app-functionality/

, 2019.

[15] Chunyang Chen, Ting Su, Guozhu Meng, Zhenchang Xing,

and Yang Liu. From ui design image to gui skeleton: a neural

machine translator to bootstrap mobile gui implementation. In

Proceedings of the 40th ICSE, pages 665–676, 2018.

[16] Jieshan Chen, Jiamou Sun, Sidong Feng, Zhenchang Xing,

Qinghua Lu, Xiwei Xu, and Chunyang Chen. Unveiling the

tricks: Automated detection of dark patterns in mobile appli-

cations. In Proceedings of the 36th UIST, New York, NY,

USA, 2023. Association for Computing Machinery.

[17] Kai Chen, Peng Wang, Yeonjoon Lee, XiaoFeng Wang, Nan

Zhang, Heqing Huang, Wei Zou, and Peng Liu. Finding un-

known malice in 10 seconds: Mass vetting for new threats at

the {Google-Play} scale. In 24th USENIX Security Sympo-

sium, pages 659–674, 2015.

[18] Sen Chen, Lingling Fan, Chunyang Chen, Ting Su, Wenhe Li,

Yang Liu, and Lihua Xu. Storydroid: Automated generation

of storyboard for android apps. In 2019 IEEE/ACM 41st ICSE,

pages 596–607. IEEE, 2019.

[19] Gregory Conti and Edward Sobiesk. Malicious interface de-

sign: Exploiting the user. In Proceedings of WWW ’10, page

271280, New York, NY, USA, 2010. Association for Comput-

ing Machinery.

[20] Alexandre Dewez. Benchmarking the pricing strategy of 100+

subscription based mobile apps.

https://alexandre.su

bstack.com/p/-benchmarking-the-pricing-strategy

2020.

[21] Linda Di Geronimo, Larissa Braz, Enrico Fregnan, Fabio

Palomba, and Alberto Bacchelli. Ui dark patterns and where

to ﬁnd them: A study on mobile applications and user percep-

tion. In Proceedings of CHI ’20, page 114, New York, NY,

USA, 2020. Association for Computing Machinery.

[22] Lijun Ding and Ardeshir Goshtasby. On the canny edge detec-

tor. Pattern recognition, 34(3):721–725, 2001.

[23] Parvez Faruki, Ammar Bharmal, Vijay Laxmi, Vijay Gan-

moor, Manoj Singh Gaur, Mauro Conti, and Muttukrishnan

Rajarajan. Android security: a survey of issues, malware pen-

etration, and defenses. IEEE communications surveys & tuto-

rials, 17(2):998–1022, 2014.

[24] Ali Feizollah, Nor Badrul Anuar, Rosli Salleh, Guillermo

Suarez-Tangil, and Steven Furnell. Androdialysis: Analysis

of android intent effectiveness in malware detection. comput-

ers & security, 65:121–134, 2017.

[25] Pengbin Feng, Jianfeng Ma, Cong Sun, Xinpeng Xu, and

Yuwan Ma. A novel dynamic android malware detection sys-

tem with ensemble learning. IEEE Access, 6:30996–31011,

2018.

[26] Yi Gao, Yang Luo, Daqing Chen, Haocheng Huang, Wei

Dong, Mingyuan Xia, Xue Liu, and Jiajun Bu. Every pixel

counts: Fine-grained ui rendering analysis for mobile applica-

tions. In IEEE INFOCOM 2017-ICCC, pages 1–9, 2017.

[27] Google. Google play billing system overview.

https://

developer.android.google.cn/google/play/billing

2022.

[28] Google-Developers. Policy center.

https://support.

google.com/googleplay/android-developer/answer/

9900533?hl=en&ref_topic=9857752

, 2022.

[29] Colin M. Gray, Yubo Kou, Bryan Battles, Joseph Hoggatt, and

Austin L. Toombs. The dark (patterns) side of ux design. In

Proceedings of CHI ’18, page 114, New York, NY, USA, 2018.

Association for Computing Machinery.

[30] ITRC. Subscription renewal scams are another way to steal

your identity.

https://www.idtheftcenter.org/post/s

ubscription-renewal-scams-are-another-way-to-s

teal-your-identity-itrc/

, 2021.

[31] Gábor Kertész, Sándor Szénási, and Zoltán Vámossy. Perfor-

mance measurement of a general multi-scale template match-

ing method. In 2015 IEEE 19th INES , pages 153–157, 2015.

[32] Li Li, Alexandre Bartel, Tegawendé F Bissyandé, Jacques

Klein, Yves Le Traon, Steven Arzt, Siegfried Rasthofer, Eric

Bodden, Damien Octeau, and Patrick McDaniel. Iccta: De-

tecting inter-component privacy leaks in android apps. In

2015 IEEE/ACM 37th ICSE, volume 1, pages 280–291, 2015.

[33] Wenjie Li, Yanyan Jiang, Chang Xu, Yepang Liu, Xiaoxing

Ma, and Jian Lü. Characterizing and detecting inefﬁcient im-

age displaying issues in android apps. In 2019 IEEE 26th

SANER, pages 355–365. IEEE, 2019.

[34] Yuanchun Li, Ziyue Yang, Yao Guo, and Xiangqun Chen.

Droidbot: a lightweight ui-guided test input generator for an-

droid. In 2017 IEEE/ACM 39th ICSE-C, pages 23–26, 2017.

[35] Kaijun Liu, Shengwei Xu, Guoai Xu, Miao Zhang, Dawei

Sun, and Haifeng Liu. A review of android malware detec-

tion approaches based on machine learning. IEEE Access,

8:124579–124607, 2020.

[36] Zhe Liu, Chunyang Chen, Junjie Wang, Yuekai Huang, Jun

Hu, and Qing Wang. Owl eyes: Spotting ui display issues via

visual understanding. In 35th IEEE/ACM ASE, pages 398–

409. IEEE, 2020.

[37] Scott M. Lundberg, Gabriel Erion, Hugh Chen, Alex DeGrave,

Jordan M. Prutkin, Bala Nair, Ronit Katz, Jonathan Himmel-

farb, Nisha Bansal, and Su-In Lee. From local explanations

to global understanding with explainable ai for trees. Nature

Machine Intelligence, 2(1):2522–5839, 2020.

[38] Scott M Lundberg, Bala Nair, Monica S Vavilala, Mayumi

Horibe, Michael J Eisses, Trevor Adams, David E Liston,

Daniel King-Wai Low, Shu-Fang Newman, Jerry Kim, et al.

Explainable machine-learning predictions for the prevention

of hypoxaemia during surgery. Nature Biomedical Engineer-

ing, 2(10):749, 2018.

[39] Alejandro Martín, Héctor D Menéndez, and David Camacho.

Mocdroid: multi-objective evolutionary classiﬁer for android

malware detection. Soft Computing, 21:7405–7415, 2017.

[40] Arunesh Mathur, Mihir Kshirsagar, and Jonathan Mayer.

What makes a dark pattern... dark? design attributes, norma-

tive considerations, and measurement methods. In Proceed-

ings of CHI ’21, New York, NY, USA, 2021. Association for

Computing Machinery.

[41] Niall McLaughlin, Jesus Martinez del Rincon, BooJoong

Kang, Suleiman Yerima, Paul Miller, Sakir Sezer, Yeganeh

Safaei, Erik Trickel, Ziming Zhao, Adam Doupé, et al. Deep

android malware detection. In Proceedings of the seventh

ACM CODASPY, pages 301–308, 2017.

[42] Arvind Narayanan, Arunesh Mathur, Marshini Chetty, and

Mihir Kshirsagar. Dark patterns: Past, present, and future:

The evolution of tricky user interfaces. Queue, 18(2):6792,

may 2020.

[43] Ehsan Noei, Feng Zhang, and Ying Zou. Too many user-

reviews! what should app developers look at ﬁrst? IEEE

TSE, 47(2):367–378, 2019.

[44] Our website.

https://sites.google.com/view/stud

y-about-subscription-uis/

[45] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel,

B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss,

V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau,

M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Ma-

chine learning in Python. Journal of Machine Learning Re-

search, 12:2825–2830, 2011.

[46] Google Play. Subscription apps on google play:

User insights to help developers win.

https:

//services.google.com/fh/files/misc/subscrip

tion_apps_on_google_play.pdf

, 2017.

[47] Thomas (TJ) Porter. What to do if youve become the victim of

a subscription scam.

https://www.mybanktracker.com/

money-tips/money/subscription-scam-296704

, 2022.

[48] Alireza Sadeghi, Hamid Bagheri, Joshua Garcia, and Sam

Malek. A taxonomy and qualitative comparison of program

analysis techniques for security assessment of android soft-

ware. IEEE TSE, 43(6):492–530, 2016.

[49] Borja Sanz, Igor Santos, Carlos Laorden, Xabier Ugarte-

Pedrero, Javier Nieves, Pablo G Bringas, and Gonzalo Ál-

varez Marañón. Mama: manifest analysis for malware detec-

tion in android. Cybernetics and Systems, 44(6-7):469–488,

2013.

[50] Andriy Slynchuk. How to know you’re being scammed by a

ﬂeeceware app.

https://clario.co/blog/how-to-spo

t-fleeceware-apps/

, 2021.

[51] Ray Smith. An overview of the tesseract ocr engine. In ICDAR

2007, volume 2, pages 629–633. IEEE, 2007.

[52] Sophos. Dissecting ﬂeeceware apps: the million-

dollar money-making machine in android and ios.

https://vb2020.vblocalhost.com/presentations/di

ssecting-fleeceware-apps-the-million-dollar-mon

ey-making-machine-in-android-and-ios/

, 2020.

[53] Android Studio. Ui/application exerciser monkey. developer.

android. com. https://developer. android. com/studio/test/mon-

key (accessed Sep. 3, 2020), 2017.

[54] Feng Tang and Hai Tao. Fast multi-scale template matching

using binary features. In WACV ’07, pages 36–36. Citeseer,

2007.

[55] Peter Teuﬂ, Michaela Ferk, Andreas Fitzek, Daniel Hein, Ste-

fan Kraxberger, and Clemens Orthacker. Malware detection

by applying knowledge discovery processes to application

metadata on the android market (google play). Security and

communication networks, 9(5):389–419, 2016.

[56] Sitalakshmi Venkatraman, Mamoun Alazab, and R Vinayaku-

mar. A hybrid deep learning image-based analysis for effec-

tive malware detection. Journal of Information Security and

Applications, 47:377–389, 2019.

[57] Timothy Vidas and Nicolas Christin. Evading android run-

time analysis via sandbox detection. In Proceedings of the

9th ASIACCS, pages 447–458, 2014.

[58] JAKUB VÁVRA. How ﬂeeceware apps have earned over $400

million on android and ios.

https://blog.avast.com/f

leeceware-apps-on-mobile-app-stores-avast

, 2021.

[59] Zhiqiang Wang, Qian Liu, and Yaping Chi. Review of an-

droid malware detection based on deep learning. IEEE Access,

8:181102–181126, 2020.

[60] Dong-Jie Wu, Ching-Hao Mao, Te-En Wei, Hahn-Ming Lee,

and Kuo-Ping Wu. Droidmat: Android malware detection

through manifest and api calls tracing. In Seventh Asia joint

conference on information security, pages 62–69. IEEE, 2012.

[61] Ke Xu, Yingjiu Li, Robert H Deng, and Kai Chen. Deepre-

ﬁner: Multi-layer android malware detection system applying

deep neural networks. In EuroSP ’18, pages 473–487. IEEE,

2018.

[62] Yuki Yada, Jiaying Feng, Tsuneo Matsumoto, Nao

Fukushima, Fuyuko Kido, and Hayato Yamana. Dark pat-

terns in e-commerce: a dataset and its baseline evaluations.

In 2022 IEEE International Conference on Big Data, pages

3015–3022, 2022.

[63] Li Yujian and Liu Bo. A normalized levenshtein distance met-

ric. IEEE TPAMI, 29(6):1091–1095, 2007.

[64] Win Zaw Zarni Aung. Permission-based android malware de-

tection. International Journal of Scientiﬁc & Technology Re-

search, 2(3):228–234, 2013.

[65] Sergey Zubkov. Subscription prices have increased by 40%,

whats next?

https://adapty.io/blog/subscriptio

n-prices-have-increased-by-40-percent/

, 2022.

Appendix

A Distribution of Fleeceware across Categories

Table 5: categories of suspected ﬂeeceware.

Categories Number of apps Ratio Video Players & Editors 20 4.51%

Photography 48 10.84% Education 18 4.06%

Entertainment 48 10.84% Lifestyle 18 4.06%

Health & Fitness 28 6.32% Weather 15 3.39%

Music & Audio 27 6.09% Productivity 15 3.39%

Personalization 25 5.64% Maps & Navigation 14 3.16%

Tools 25 5.64% Art & Design 12 2.71%

Business 24 5.42% Others

87 19.64%

Communication 21 4.74% Total 443

“Others” contains 18 other categories, e.g., “Comics”, “Sports”, “Events”, “House & Home”, “Travel & Local”, each

of which accounts for less than 2.5%.

B Label Tool

We developed an annotation tool to enhance the labeling efﬁciency of experts. The interface is shown in Figure 9.

Figure 9: An example showing what users need to label.

C Features of Subscription UIs

Table 6: Features of Subscription UIs

(S is a given subscription UI)

Features Phenomena



1 ∃t ∈ T

∈ t

0 Otherwise

, where T

is the text set of the UI S, T

is the text of price information (PI). No.1



1 ∃t ∈ T

∈ t

0 Otherwise

, where T

is the text set of the UI S, T

is the text of billing frequency (BF) information. No.2







−1 ∀t ∈ T

/∈ t

1 ∃t ∈ T

T D

∈ t

0 Otherwise

, where T

is is the text of free trial information, T

T D

is the text of trial duration information No.3



1 ∃t ∈ T

∈ t

0 Otherwise

, where T

is the text set of the UI S, T

is the text of auto-renewal information. No.4



1 ∀(t ∈ T

and T

∈ t),T

∈ t

0 Otherwise

, where T

is the text of free trial information, T

is the text of price information. No.5



1 ∃t ∈ T

≤ N

0 Otherwise

, where N

and N

are the number of BF information and PI information in text t respectively. No.6







min

t∈T

≥2 and N

≥2



f nt_sz(T

)

f nt_sz(T

)

× F

bold

)



∃t ∈ T

≥ 2 and N

≥ 2

2 Otherwise

, where f nt_sz(T

) and f nt_sz(T

)

are the font size of price information with the largest and the second largest billing frequency respectively. F

bold

) =



f nt_sty(T

) = “bold”

1 Otherwise

, f nt_sty(T

) is the font style of the text of PI, k

is the weight parameters which can be adjusted.

No.7



0 ∃T

∈ T

,val(T

) ̸= val(T

T D

)

1 Otherwise

, where T

is cancellation deadline for subscription, val() outputs the value of an element. No.8



1 ∀b ∈ B

,“subscri ∗ ” ∈ T

0 Otherwise

, where B

is the button set in the UI S, T

is the text of the button. No.9



1 ∀b ∈ B

∈ T

0 Otherwise

, where T

is the text of interactive button in the UI S, T

is the text of price information. No.9

min

∈T

,b∈B

{

dis(ver_loc(T

),ver_loc(b))

}

, where ver_loc() outputs the vertical location of an element, H

is the UI’s height. No.9

= max

t∈T

∈t



len(T

)

len(t)

× k

f nt_sz(T

)

f nt_sz(t)

× F

bold

)



, where f nt_sz() outputs the font size of the text. No.10







−1 F

= 0

0 ∃T

∈ T

,obs(T

) = False

1 Otherwise

, where obs(T

) outputs whether the price information can be clearly observed. No.10







−1 F

= 0

0 ∃T

∈ T

,obs(T

) = False

1 Otherwise

, where obs(T

) outputs whether the BF information can be clearly observed. No.10







−1 F

= −1 or F

= 0

0 ∃T

T D

∈ T

,obs(T

T D

) = False

1 Otherwise

, where obs(T

T D

) outputs whether the trial duration information can be clearly observed. No.10







−1 F

= 0

0 ∃T

∈ T

,obs(T

) = False

1 Otherwise

, where obs(T

) outputs whether the auto-renewal information can be clearly observed. No.10



1 ∀ic ∈ IC

,obs(ic) = Tr ue

0 Otherwise

, where IC

is the icon set of a UI S, obs(ic) indicates whether ic can be clearly observed. No.11



1 ∀T

∈ T

,val(T

) ≤ PR_MAX

0 Otherwise

, where val(T

) is the value of PI, PR_MAX is the reasonable market price. No.12



1 ∀ T

∈ T

,val(T

) ≤ val(T

)

0 Otherwise

, where

and

are price with and without a trial. No.12

Note: According to reports [20, 46, 65], we set PR_MAX as “$15/week” and “$40/month” in our work. The value can be changed according to user’s expectation and the market

price.