A tired.com frequency count

One of my ongoing projects is tired.com (as described in this Slate article by Paul Boutin). Since I have a fairly large corpus (to use the linquistic geek term) to play with, I occasionally do a little analysis on it. Here are the top 250 words used by the 6000 tired.com authors in 2004, in frequency order. You can probably figure out the gist of the subject of many of the letters from this list:

1. tired
2. i
3. and
4. the
5. of
6. a
7. my
8. am
9. it
10. in
11. you
12. because
13. is
14. that
15. for
16. this
17. have
18. me
19. not
20. so
21. at
22. up
23. im
24. sleep
25. all
26. do
27. on
28. with
29. but
30. or
31. just
32. get
33. no
34. be
35. why
36. are
37. was
38. can
39. work
40. like
41. go
42. if
43. out
44. about
45. night
46. may
47. time
48. what
49. now
50. don
51. we
52. day
53. mail
54. really
55. know
56. your
57. too
58. they
59. people
60. any
61. as
62. had
63. by
64. then
65. much
66. life
67. want
68. when
69. been
70. who
71. being
72. he
73. e
74. an
75. one
76. she
77. school
78. more
79. there
80. her
81. will
82. part
83. its
84. hours
85. think
86. only
87. last
88. would
89. got
90. has
91. dont
92. well
93. some
94. going
95. new
96. email
97. how
98. back
99. even
100. please
101. us
102. good
103. ve
104. u
105. enough
106. other
107. very
108. love
109. free
110. bed
111. which
112. them
113. feel
114. home
115. late
116. need
117. our
118. every
119. way
120. job
121. never
122. here
123. things
124. make
125. bored
126. still
127. their
128. morning
129. always
130. could
131. also
132. than
133. today
134. d
135. right
136. information
137. over
138. help
139. old
140. off
141. intended
142. after
143. around
144. take
145. image
146. friends
147. gif
148. site
149. tell
150. having
151. went
152. getting
153. see
154. org
155. should
156. cause
157. two
158. ll
159. long
160. something
161. his
162. years
163. week
164. him
165. use
166. year
167. into
168. little
169. recipient
170. thanks
171. didn
172. unable
173. print
174. myself
175. doing
176. find
177. nothing
178. working
179. hard
180. read
181. maybe
182. where
183. again
184. makes
185. cuz
186. world
187. anything
188. early
189. until
190. house
191. money
192. cant
193. did
194. these
195. wake
196. down
197. best
198. days
199. everything
200. trying

201. friend
202. lot
203. many
204. computer
205. live
206. ever
207. thing
208. before
209. stupid
210. better
211. most
212. say
213. those
214. same
215. yes
216. confidential
217. family
218. keep
219. since
220. person
221. care
222. ru
223. thank
224. college
225. come
226. sick
227. stay
228. virus
229. next
230. shit
231. bad
232. does
233. fucking
234. website
235. were
236. while
237. months
238. hate
239. though
240. able
241. oh
242. hour
243. hi
244. thats
245. girl
246. reason
247. web
248. let
249. first
250. kids

Whoops, I thought I had taken out most of the non-content words, but it looks like "confidential", "recipient" and "information" slipped through. These usually come from disclaimers like this, which makes them especially amusing:


CONFIDENTIALITY NOTICE: This electronic message transmission is intended only for the person or the entity to which it is addressed and may contain information that is privileged, confidential or otherwise protected from disclosure. If you have received this transmission, but are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this information is strictly prohibited. If you have received this e-mail in error, please contact the sender of the e-mail and destroy the original message and all copies.

2 Comments

There are two dimensions in which differences can be compared: word frequency and popularity. The latter is easier to compare, since it just means alphabetizing the top 250 words in the emails and comparing them to the top 250 words in English (according to a list) with the unix 'comm' command (which is SO much better than 'diff' for this--I can't believe I never knew about it). They're pretty different, actually. The ones that only appear in the tired email are as follows:
able

always

am

anything

around

bad

because

bed

being

best

better

bored

cant

care

college

computer

confidential

cuz

d

days

didn

doing

don

dont

e

early

email

enough

every

everything

family

feel

first

free

friend

friends


fucking

getting

gif

girl

going

got

hate

having

hi

hour

hours

i

im

image

information

intended

into

its

job

kids

ll

lot

love

mail

makes

maybe

money

months

morning

myself

next

not

nothing

oh

org

person


please

print

really

reason

recipient

ru

shit

sick

since

site

sleep

something

stay

stupid

thank

thanks

thats

thing

things

those

tired

today

trying

u

unable

until

ve

virus

wake

web

website

week

working

years

yes

It'd be interesting to compare the order of this list with the order of the same words in the english language at-large (from wordcount.org). Obviously "Tired" would leap out ahead, but what else?

Ads

Archives

ThingM

A device studio that lives at the intersections of ubiquitous computing, ambient intelligence, industrial design and materials science.

The Smart Furniture Manifesto

Giant poster, suitable for framing! (300K PDF)
Full text and explanation

Recent Photos (from Flickr)

Smart Things: Ubiquitous Computing User Experience Design

By me!
ISBN: 0123748992
Published in September 2010
Available from Amazon

Observing the User Experience: a practitioner's guide to user research

By me!
ISBN: 1558609237
Published April 2003
Available from Amazon

Recent Comments

  • Mike: There are two dimensions in which differences can be compared: read more
  • Michal Migurski: It'd be interesting to compare the order of this list read more

About this Entry

This page contains a single entry by Mike Kuniavsky published on August 7, 2005 11:26 AM.

Vonage sponsors a phone casemod contest was the previous entry in this blog.

The Wand of Ubiquity is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.