For example, an automated system can access a number of YouTube channels, collecting the username, photo and number of followers of the channel owner. An entire database of these records becomes a privacy issue even though the data itself is public.
Once the data has been collected into a database, it is normally expected to be protected. But TNW reports that a database of 235 million records was found on the web without password protection.
The data collected had four main datasets with details of millions of users from the above-mentioned platforms. It contained information such as profile name, first and last name, profile picture, age, gender and follower statistics
Now, what’s interesting is that the report shows that security researcher Bob Diachenko, principal investigator at security firm Comparitech, found three identical copies of the database on Aug.1. According to Diachenko and the team, the data belonged to the now defunct company Deep Social.
When they contacted the company, the request was sent to Hong Kong-based Social Data, which admitted the infringement and closed access to the database. However, the social data denied having any ties to Deep Social. Deep Social made the following statement.
“Please note that the negative meaning of data being hacked means that the information was obtained secretly. This is not the case. Anyone with Internet access can use all data for free.”
Comparitech says that each record contains some or all of the following:
Personal information belongs to the company or there are advertisements
Statistics about follower engagement, including:
Number of followers
Follower growth rate
In addition, approximately 20% of sample records contain phone numbers or email addresses. As TNW stated, this type of data can be used for spam or phishing attempts. The terms and conditions of the service usually prohibit crawling, but a California court ruled it illegal last year. In many cases, this may be a good thing.
For example, CityMapper is a very popular application that can figure out how to get real-time traffic and public transportation data from A to B in the city in the fastest way. Today, most bus companies provide this data through API, but in the early days, it was only available on the Internet. Early pioneers provided a convenient way to make data more usable by web crawling CityMapper.
Nowadays, when companies put useful data on the web but not through APIs, web crawling is still useful. For example, price comparison services often still rely on crawling.
However, capturing personal data is another matter, and the court may need to distinguish between the two types of use.