Facebook, hype about privacy. Its a little late
December 22nd, 2009Facebook – Hype about privacy, its a little late
I had my interest in data held by Facebook heightened with the constant media attention that the much publicised changes to Facebook’s privacy policy brought. Which is the reason for this blog post (and @tracsec tech segment). There is no doubt in my mind that Facebook faces challenges with data that most governments do not have to consider, I suppose the only other companies that spring to mind is the giants of Google, and the makers of Windows Everest, err sorry I mean Microsoft.
Information is a truly wondrous thing, however it being held in the wrong hands can spell certain disaster. I was once asked what business Google was in, to which I answered Advertising, to my dismay I was informed I was wrong, my error was corrected. Google is a company that specialise in ways to make you give them data, they then use that data to make money. Information is power and that is no more aptly proved than how Google matches Microsoft in the brand awareness stakes, but also managing in the process to become a byword for searching the internet.
I think it fair I should mention, that I am dyslexic and 7 Windmills was my idea. All joking aside Facebook is know playing with vast quantities of personal data, and a strong unique understanding of how we interact with each other.
It seems to have worried a large section of the media, but I’m left asking really what is the difference now, to last month for a hacker. Social engineering is an emerging art, but lets face it, its a renaissance its nothing new and hacking has been as much about the person as it is about the system.
Having someone’s credentials starts to aid in targeted attacks, it seems logical to target the individuals themselves.
Impersonation isn’t easy when you know nothing about someone, taking a wild guess at someone’s date of birth, or which school they went to isn’t easy. Which is why they where for a long time important details, used to verify you you identity. Lets think here though we give these away everyday, its part of most registration processes for services, and to most people represents little or no value.
I set this scenario; A malicious attacker wishing to cause havoc, they decide that a university would be a good target. It as a target has some great reasons for it to be chosen. It has a lot of public (internet) facing resources, a lot of users, from those users there is a mix of privileges, from information to technical resources, they tend to have good bandwidth and lots of storage just to name a few reasons. They key aspect here is that there is a abundance of users, in reality playing the numbers game. It seems kind of stupid to jump straight in and randomly guess usernames.
We as individuals are social beings for most parts, and one of the key factors in Facebook success is its ability to connect us to networks, networks like where we went to school, who has employed us, and where we went to university, where we work, what our hobbies are. In some extents to actually how we’re feeling on a particular day Most universities have Facebook group, and I think it fair to suggest most people part of that group either are or have gone to that university or worked there in some capacity. It seems a good starting point, however we are all lazy people at heart and no one wants to go through every single member of a group one by one and copying the data out by hand. We could look at using screen scrapers however its not as simple to achieve as you may think, Facebook requires you to have an account, you do need to be logged in and have a session, and using tools like wget or curl require you to do this as well. However Facebook is also famous for its applications, and of course everyone loves them (or not). They can be made for lots of things and this is to do with Facebook’s API (application programming interface). In short Facebook’s API are really just a set of instructions that can be used to interact with users. Of course interacting means getting a certain amount of information between the parties involved.
A simpler process for our potential attacker is to use Facebook API to get information about their target, an example of one of their API calls is groups_getMembers(GROUPID). This requests from Facebook all those who are members of a particular group. It will give you a list users unique Facebook ID, who are members of a particular group. Another example API call is users_getInfo(FacebookID,’first_name, last_name, name, timezone, birthday, sex, locale, profile_url, proxied_email’) I think you can probably see where I’m going with this, we can start to build a very detailed list of current people who are connected to a group or organisation. Its also worth mentioning at this point, that yes you do need a Facebook account to use Facebook’s API, however the people returned back from the API calls neither installed an application or visited a site we controlled, this information was gained completely legally, this was all information that the user willingly gave to Facebook and then in turn they gave us permission to retrieve. As long as we don’t store the data for more than 24 hours.
That’s right if I follow the word of the agreement, I’ll need to delete the data with 24 hours. Bearing in mind that so far I have not interacted with anyone. I have for most part been able to get the name, an organisation they are connected to, and dependant on their exact privacy settings a wealth of personal information. The benefit of API’s is it is easy to write a applications or a scripts, and Facebook supports a number of programming languages, there is an number of languages that have unofficial support such as Python.
Its not a particular stretch for an attacker to write a script that gets all the members of a university or company group and build a list of first and last names, where possible their sex, date of birth, location, their Facebook webpage, where they are currently located and store that in a database. A little more internet searching and we may discover a companies naming convention for company emails. The attacker has very simply gained an advantage without the threat of triggering alarms and remaining mostly passive. This list then could be used with tools such as Maltego to further build a complete understanding of that person. Once a individual has been targeted to try and use to gain entry the attacker could start to make a bespoke list of words and terms they use, by downloading pages from Facebook or by simple Google hacking and pulling posts from forums, mailing lists and striping out all the HTML code and common words (such as the, and, a, it, so on and so forth). Of course the list generated gives the attacker an advantage at brute forcing passwords, its likely to have things such as children names, partners names, dates of birth specific to the target. Tools such as Cewl make the process of crawling a site and generating the list a relative simple task.
It also seems logical that other social networking sites could be attacked for lots of various other information about potential targets. It maybe possible to obtain every tweet if potential target has a Twitter account, using Twitter’s API obtaining a list of a targets Twitter history. This could be a good resources for further expanding the everyday words and terms that a potential target may use. Its fair to say that no one Twit may cause concern about privacy, however the full list of them may add to yet another great resource of information. However I believe that you would have to allow your Twitter account to be public, and this information could be obtained using tailored Google searches, however as previously stated it makes a lot more sense to take the data supplied.
A potential indicator to this sort of social inspired attack, could be to seed the group with a number of dummy user accounts. Using passwords generated out of web pages for that dummy user. We could watch the dummy user accounts for access, and all though not fool proof, if this account starts to generate unwelcome attention then someone may have tried to profile our organisation and careful vigilance should be applied.
I discussed my ideas and thoughts on this subject with Chris John Riley, Ryan Dewhurst and Tom Mackenzie on the tracSEC podcast technical segment which should be available for public release when this post hits the blog. It was an interesting chat and very enlightening for everyone involved. I learned a great deal of stuff when discussing this with them, and in this case four heads are better than one.
In closing I think to be worried about how third parties may abuse changes in Facebook’s privacy policy is warranted, I would urge you to think what a bad guy could do without the limitations of regulations of business. Some information can’t be put back into the bottle. We as a community need to accept a certain level of information about us is in the public domain, and mitigate that accordingly. However asking questions of how much data about us is being held by one commercial organisation and how other people assimilate that data is critical.
http://wiki.developers.facebook.com/index.php/How-to_Guides
http://developers.facebook.com/tools.php
http://en.wikipedia.org/wiki/Api
http://www.willmcgugan.com/2008/02/09/writing-a-facebook-application-with-python-pt-i/
http://wiki.developers.facebook.com/index.php/User:PyFacebook_Tutorial
http://www.digininja.org/projects/cewl.php
http://www.theregister.co.uk/2009/12/14/facebook_photo_privacy_snafu/
http://voices.washingtonpost.com/securityfix/2009/12/check_your_facebook_privacy_se.html
http://www.scribd.com/doc/2458/Facebook-Threats-to-Privacy
http://www.spylogic.net/2009/12/new-facebook-privacy-settings-for-better-or-for-worse/
The tracSEC podcast can be downloaded from here