eBay
M2M
Email
Analysis
on
Hadoop
Pla4orm
Forest Su, Daniel Zhang
October
2011
eBay
Inc.
The Birth of eBay . . .
eBay
Inc.
2
. . . sold for $ USD
Started with a Broken Laser Pointer . . .
AuctionWeb was born on the Labor
Day weekend in September 1995
Pierre Omidyar
$30 eBay Founder
eBay
Fact
–
15
Years
A=er
…
eBay
Inc.
3
450+ Million
Registered Users
Over 2 Billion
Photos
220+ Million Active Item
Listing for sale
50,000 Categories
2 Petabytes
Stored
25 Petabytes
Processed daily
300+ Features
per quarter
100,000 lines
of code rolled out
every 2 weeks
48 Billion SQL Calls
Per day
Billion API Calls
Per month
> GB
Source Code
Global Presents
In 33 International
Markets
10+ Million
New Items Added
Per Day
$2,000+ USD Trading Value
Per Second
4
Velocity of trading
On an average day on eBay…
A Diamond Ring is sold every two minutes
5
On an average day on eBay…
Over 3600 MP3 players are sold
Velocity of trading
6
Velocity of trading
On an average day on eBay…
Over 300 stamps are sold every hour
7
Velocity of trading
On an average day on eBay…
An automobile is sold every minute
Hadoop on the Landscape
eBay
Inc.
8
Dual
Load
Dual
Load
Dual
Load
Primary/
Secondary
Query
Director
Dual
Load
Transactional
Databases
Deep
System
Research
Platform
Unstructured
Data
Structured
Data
Structured
Data
Dual
Load
SocPArc
ETL
Load
Business
Users
Dual
LoadStructured/
Unstructured
Data
DataFeed
Dual
LoadUnstructured
Data
DataFeed
Dual
Load
Search
Analysts
Hadoop
Data
Intensive
Platform
Node:
500+
Disk
Space:
Approx.
Component:
HDFS/Hive
Hardware:
Commodity
Hardware
Knowledge
Feedback
Knowledge
Feedback
eBay
Email
Universe
• Members
write
to
each
other
(“M2M”)
• Members
write
to
eBay
• eBay
writes
to
Members
-‐ Millions
of
times
per
day!
• Where
are
these
messages?
o In
My
Messages:
Messages
through
eBay
o In
a
user’s
private
system:
Messages
outside
of
eBay
eBay
Inc.
9
Single
User
Study
• What
does
a
typical
user
send
/
receive?
• Follow
a
single
user
for
50
days
during
Oct/Nov
2010
• User
ProNile:
― User
subscribed
to
daily
deal
― User
was
member
of:
§ eBay
Bucks
§ eBay
VIP
§ PayPal
Advantage
― User
is
mainly
a
buyer
but
also
a
casual
seller
eBay
Inc.
10
Single
User
Study:
Volume
eBay
Inc.
11
Count
Grouping
Percentage
Percentage
excl
Daily
Deal
Daily
Average
53
eBay
Daily
Deal
%
35
eBay
%
%
16
PayPal
%
%
87
Related
%
%
8
Watched
item
ending
%
%
9
Selling
%
%
3
Other
%
%
211
Total
Single
User
Study:
Visibility
eBay
Inc.
12
Count
In
MyMessages
Percentage
Percentage
excl.
Daily
Deal
53
No
(Daily
Deal)
%
124
No
%
%
26
Inbox
%
%
8
Sent
%
%
211
Total
Only % are visible from within eBay
Email
Data:
Where
is
it?
eBay
Inc.
13
Emails are stored in a mixed system:
- Metadata in DW
- Message body on NAS before 10/2010, on Grid (Low
Cost Storage) since 10/2010
Email data not minable in current storage
- Queries on Email Subjects in Teradata suffer CPU skew
- Email body in individual files, does not scale at all
eBay
Sample
Email
eBay
Inc.
14
Major
Challenges
• Data
is
large
― Data
is
in
individual
message
Niles:
§ millions
of
Niles
per
day,
>
1
billion
Niles
per
year
• Data
contains
Personally
IdentiNiable
Information
(PII)
― Hadoop
current
release
does
not
support
effective
§ User
Access
Permissions
-‐>
Need
Encryption
eBay
Inc.
15
Challenges
Answered
eBay
Inc.
16
This copied over 1 billion files, about 15TB compressed
Hadoop
Data
Storage
Use
eBay
Inc.
17
Year
Month
Raw
M2M
GigaBytes
Clean
M2M
GigaBytes
2009
9-‐12
4,060
242
2010
1-‐12
14,959
944
2011
1-‐7
10,249
1,282
Total
29,268
2,467
2011
1
1,274
85
2011
2
1,271
81
2011
3
1,650
233
2011
4
1,484
219
2011
5
1,560
230
2011
6
1,506
217
2011
7
1,505
218
eBay
M2M
Emails
/
Year
eBay
Inc.
18
0
1,000,000
2,000,000
3,000,000
4,000,000
5,000,000
6,000,000
7,000,000
8,000,000
9,000,000
7-1-2010 8-1-2010 9-1-2010 10-1-2010 11-1-2010 12-1-2010 1-1-2011 2-1-2011 3-1-2011 4-1-2011 5-1-2011 6-1-2011
M2M 1 year
eBay
System
Emails
eBay
Inc.
19
0
5,000,000
10,000,000
15,000,000
20,000,000
25,000,000
30,000,000
35,000,000
3-1-2011 4-1-2011 5-1-2011 6-1-2011
eBay System
eBay
System
Emails
eBay
Inc.
20
20%
15%
14% 8%
8%
5%
4%
4%
4%
3% 3%
12%
Agreement Update Marketing
Offers Payments
Seller's Return Policy eBay Bucks
Surveys UPI
Password / Email / Userid Cancelling Purchase
Welcome eBay / MyMessages Miscellaneous
eBay
System
Emails
eBay
Inc.
21
%
%
%
%
%
%
%
%
%
0 2 4 6 8 10 12 14 16 18 20
Messages per conversation
How many emails per conversation ?
eBay
Emails
eBay
Inc.
22
In an M2M conversation, how long for the first reply?
%
%
%
%
%
%
%
0 5 10 15 20 25 30 35
Pre-sale, Buyer asks Seller
Post-sale, Buyer asks Seller
Post-sale, Seller asks Buyer
eBay
M2M
Emails:
Profanity
Profanity
detection:
Identify
messages
containing
language
not
becoming
an
eBay
member
Sample
messages
found
with
simple
Nilters:
• f@#$
off
you
mother
f@#*#$
and
do
wot
you
want
your
not
getin
any
money
you
c#%$
you
sad
prick
• You
f@#!#$
c#%$!
Stop
pissing
around!
Don't
waste
my
time
you
son
of
a
b#$%!
• im
saying
that
i
want
these
f@#*#$
blake
grifAins.
i
will
come
and
kill
you
if
you
do
not
AGREE!!!!!!
so
please
ship
these
to
me
for
42
dollars
and
a
couple
south
beach
lebron
replicas.
if
you
do
not
comply,
i
will
have
to
kill
you.
i
know
where
you
live.
eBay
Inc.
23
eBay
M2M
Emails:
Next
Steps
The
study
of
this
data
is
yet
as
its
infancy.
Sample
future
use
cases
are:
• Identify
buyers
extorting
refunds
from
sellers
by
threatening
to
leave
negative
feedback
or
to
open
a
case
(Ongoing:
TnS
/
Buyer
Abuse
team)
• Identify
sellers
creating
BBEs
before
negative
feedback
is
left,
or
also
in
cases
where
no
feedback
is
left.
• Identify
sellers
creating
more
than
the
average
number
of
emails
(ASQs),
with
the
intent
to
determine
on
how
to
help
these
sellers
to
improve
our
buyers’
experiences.
eBay
Inc.
24
Q&A
eBay
Inc.
25