There is hardly a day without a story about false news. It reminds me of a quote from the favorite radio journalist of my youth: "If you do not like the news, go out and do it yourself." The revolutionary linguistic model of OpenAI, the 1.5 billion parameter of GPT-2, close enough for the group to decide that it is too dangerous to publish publicly, at least for the moment. However, OpenAI has now released two smaller versions of the template, as well as tools for adjusting them with your own text. So, without much effort, and using considerably less GPU-2 that will be able to generate text from scratch, you can create an optimized version of GPT-2. answer questions similar to those with which you train.
GPT-2 (Generative Pre-Trained Transformer Version 2) is based on a version of the very powerful Transformer Neural Network Warning. What excited OpenAI students so much was discovering that she could handle a number of language tasks without being directly trained in these tasks. Once preformed with his massive corpus of Reddit data and with the appropriate prompts, he did a fair job in answering questions and translating languages. As far as semantic knowledge is concerned, it's certainly not at all Watson, but this type of unsupervised learning is particularly exciting because it saves a lot of time and money for data labeling for a supervised learning.
For such a powerful tool, working with GPT-2 is fortunately quite simple, as long as you know at least a little Tensorflow. Most of the tutorials I've found are also based on Python. It is therefore very useful to have at least a basic knowledge of programming in Python or in a similar language. Currently, OpenAI has released two pre-formed versions of GPT-2. One (117 million) has 117 million parameters, while the other (345 million) has 345 million. As might be expected, the larger version requires more GPU memory and takes longer to exercise. You can train on your CPU, but it's going to be really slow.
The first step is to download one or both models. Fortunately, most tutorials, including the ones we're going to browse, use Python code to do it for you. Once downloaded, you can run the pre-formed template to generate text automatically or in response to a prompt you provided. But there is also a code that allows you to build on the pre-formed model by tweaking it on a data source of your choice. Once you have adapted your model to your satisfaction, you just have to run it and provide appropriate prompts.
There are a number of tutorials on this subject, but my favorite is Max Woolf. In fact, until the publication of OpenAI, I was working with his RNN text generator, which I borrowed for his work on GPT-2. He provided a complete package on GitHub to download, adjust and run the GPT-2 based model. You can even take it directly in a PyPl package. The readme file guides you through the process, with some suggestions on how to change different settings. If you have a large number of GPUs at hand, this is an excellent approach, but since the 345M model requires most 16GB GPUs for training or tuning, you may need to turn to a cloud GPU.
Fortunately, there is a way to make free use of a powerful graphics processor in the cloud: Google's Colab. It's not as flexible as the Google Compute Engine account today, and you have to reload everything, but did I say it was free? During my tests, I got a Tesla T4 or K80 GPU when I booted a laptop, one of the two being fast enough to train these models at a reasonable pace. The best part is that Woolf has already written a Colab notebook echoing the local python code version of gpt2-simple. Just like the desktop version, you can simply follow or modify parameters to experiment. It's more complicated to integrate the data in Colab, but the notebook will also guide you.
Now that powerful language templates have been published on the Web and there are plenty of tutorials to use them, perhaps the most difficult part of your project is creating the dataset you want to use for tuning. If you want to replicate the experiences of others by having him generate Shakespeare or write Star Trek dialogue, you can simply hook one that is online. In my case, I wanted to see how the models would be used to generate articles like those found on ExtremeTech. I've had access to a catalog of over 12,000 items from the last 10 years. I was able to group them into a text file and use them as a basis for fine tuning.
Eleven I had my body of 12,000 ExtremeTech articles, I started trying to train the GPT-2 simplified on the GPU Nvidia 1080 of my desktop. Unfortunately, the 8GB RAM of the GPU was not enough. So I switched to 117M model training on my 4-core i7. It was not terribly terrible, but it would have taken a week to make a real fall, even with the smaller of the two models. So I moved to Colab and the 345M model. The training was much, much, faster, but it was annoying to handle the session resets and the unpredictability of the GPUs I was getting for each session.
After that, I bit the ball, opened a Google Compute Engine account, and decided to take advantage of Google's $ 300 credit to new customers. If you do not master the configuration of a virtual machine in the cloud, the task can be a bit difficult, but there are many guides online. This is simple if you start with one of the preconfigured virtual machines on which Tensorflow is already installed. I chose a Linux version with 4 vCPU. Even though my desktop system is Windows, the same Python code worked perfectly on both. You must then add to GPU, which in my case has taken a request for permission from Google's technical support. I guess this is because GPU-equipped machines are more expensive and less flexible than CPU-only machines, so they have a certain validation process. It only took a few hours and I was able to launch VM with a Tesla T4. When I logged in for the first time (using the built-in SSH), he reminded me that I had to install the Nvidia drivers for the T4 and gave me the command which I'd I needed it.
Then you need to configure a file transfer client such as WinSCP and start using your template. Once you have loaded your code and data, created a virtual Python environment (optional) and loaded the necessary packages, you can proceed in the same way as on your desktop. I trained my model in increments of 15,000 steps and I downloaded the model checkpoints each time. I would have them as a reference. This can be particularly important if you have a small set of training data, as too much training can lead to over-adaptation and worsening of your model. It is therefore useful to have control points.
Speaking of control points, like models, they are great. So, you will probably want to add a disk to your VM. By arranging the disc separately, you can still use it for other projects. The automatic editing process is a bit awkward (it seems like it can be a checkbox, but it is not). Fortunately, just do it at eleven. After installing my virtual machine with the code, the template, and the training data I needed, I dropped them. The T4 was able to run about one step every 1.5 seconds. The virtual machine I had set up was about $ 25 a day (do not forget that the virtual machines do not shut down themselves, you have to stop them if you do not want to be billed, and the persistent disk continues to be charged even at that time).
To save some money, I transferred the model checkpoints (.zip file) to my desktop. I could then shut down the virtual machine (save a dollar or two per hour) and interact with the model locally. You get the same result in both cases because the model and the control point are the same. The traditional method to evaluate the success of your training is to retain some of it in the form of a validation game. If the loss continues to decrease, what is the probability that you started calculating the loss when you run your model on the data you provided for validation? simply "remember" your entry and return it to you. This reduces its ability to process new information.
After experimenting with different types of prompts, I focused on feeding the model (which I nicknamed The Oracle) in the first sentences of current ExtremeTech articles and on the results. After 48 hours (106,000 steps in this case) of training on T4, here is an example:
The more information the model has on a subject, the more it starts to generate plausible text. We write a lot about Windows Update, so I thought I'd let the model give it to try:
With something as subjective as text generation, it's hard to know how to go with the formation of a template. This is all the more true as every time a prompt is submitted, you get a different answer. If you want plausible or fun answers, it's best to generate multiple samples for each prompt and browse them yourself. In the case of the Windows Update prompt, we powered the model using the same prompt after a few hours of training, and it seemed like the extra work might have been helpful:
I was impressed, but not blown away, by the crude predictive performance of GPT-2 (at least the public version) compared to simpler solutions such as textgenrnn. What I did not realize was that later, it's versatility. GPT-2 is general enough to handle a wide variety of use cases. For example, if you give him a pair of French and English sentences in the form of a prompt, followed by a sentence in French only, the translation of the money is plausible. Or if you give these question-answer pairs, followed by a question, it's a decent job to find a plausible answer. If you're generating interesting text or articles, consider sharing them because it's definitely a learning experience for all of us.
div id=””>All the noise around the video "drunk Pelosi" has highlighted something clear: Facebook wants to have both.
Some members of society would have us believe that the platform is fighting against the misleading content of "false information". But Facebook will not want to remove elements such as the clip perni Pelosi, preferring to let users choose themselves what to believe.
That's what comes out of the comments from Monika Bickert, head of Global Policy Management at Facebook, in an interview with CNN's Anderson Cooper.
"We think it's important for people to make their own informed choices about what to believe," she said. Facebook collaborates with independent fact-checking organizations to identify misleading content and report accordingly. So, the company knows what is wrong. This simply does not remove this content.
That 's what happened with a video of House Speaker Nancy Pelosi, who was falsified to give the impression that she was drunk or d & # 39; another altered way. The fact checkers have reported it as "false", a designation that gives the video a captioned warning and a reduced presence in News Feeds.
The video will not be completely deleted as it does not violate community standards. Facebook will actively remove this content that encourages violence and breaks the rules. The company has also shown a willingness prohibit false accounts and distort problematic figures who repeatedly violate the rules of the site.
blockquote class=”pull-quotes” data-fragment=”we-think-its-important” data-description=””We think it’s important for people to make their own, informed choice about what to believe.”” data-micro=”1″>"We think it's important that people make their own choices knowingly what to believe on."
In CNN's interview, Cooper again and again asks Bickert why Facebook would not simply delete content that was marked as fake. She comes back again and again to this idea of letting users decide for themselves what to trust. Deceptive content is reported as such, and that is supposed to be enough.
Is it good? There is evidence Even reported and demoted content can have a significant impact on Facebook. It may be because some people have worked hard to promote the message that the media set is an enemy that should not be trusted. People believe that they want to believe in the current climate, to the point that deceptive content could be read positively by some readers and belief systems.
Cooper tries to answer this question in his interview with Bickert, pointing out that "video is more powerful than anything you put under the video". Bickert deviates, suggesting that the video is acceptable to keep because the conversation around her has shifted to questions like the one Cooper has posed.
"In fact, we see this conversation on Facebook, Twitter and offline as well, about the manipulation of this video," she said. "As evidenced by my appearance today, here is the conversation."
Later in the segment, Cooper insists on Bickert's responsibility regarding Facebook's accuracy as an information provider. She pushes back, pointing out that the company is a social media company, not a news company. When Cooper presses it again – "you share news … because you're making money out of it," he argues – Bickert draws a dividing line between violent content violating rules and political speech .
"If false information is security-related, we can delete it and we delete it, we work with security groups to do it, but when we talk about political discourse and misinformation, we think that good approach is to let people make an informed choice, "she said.
What an amazing thing to do when you are barely a month away from the exit of the Mueller report of nearly 500 pages, about half of which focuses on Russia's efforts to influence the US 2016 program. Presidential Election As we now know, much of this effort has involved exploiting social media platforms such as than Facebook.
It is now proven that political misinformation can have detrimental effects. Proven without the shadow of a doubt. It's great to see Facebook taking action against the kinds of fake accounts that help spread the bad stuff, but it's only half a measure. Many real people are trapped in the misinformation on the Internet that they share on social media.
This is not the first time that Facebook is relying on a policy to defend the presence of inappropriate content on the platform. But it's a more and more difficult pill to swallow because suspicions regarding the negative impact of misinformation on political discourse have proven accurate again and again. Maybe the time has come for Facebook to change channels.
. (tagsToTranslate) facebook (t) anderson-cooper (t) fake news (t) tech (t) social media companies</pre></pre>
Facebook has a new verification partner, Axios first reported Thursday: CheckYourFact.com, the fact-finding arm of The Daily Caller, a right-wing website founded by hawker theorist of the plot Tucker Carlson.
CheckYourFact.com describes itself as "editorial independent" of The Daily Caller. Its funding comes from The Daily Caller's operating budget, advertising revenues and a grant that is ultimately funded by conservative political groups, according to Media questions, a conservative media monitoring organization.
Facebook has accepted as a partner because CheckYourFact.com is accredited by the Poynter Journalism Institute International Fact Verification Network (IFCN). Accreditation qualifies organizations for Facebook factual partnership, a program that uses third parties to evaluate the truth of articles posted on Facebook. If an article is considered misleading, Facebook retrogrades it in news and research.
Despite Poynter's will, skepticism and anger over the new partnership are legion. An organization affiliated with The Daily Caller – a swamp of misinformation and bias – can it be right?
"Facebook adds The Daily Caller as official fact checker, it's terrifying …"
"Adding Facebook to The Daily Caller as an official examiner is terrifying," said Angelo Carusone, director of Media Matters, at Mashable. "At a time when newsrooms continue to reduce their staff, right-wing media can resist vertical concealment by checking facts and exploiting gaps in the landscape."
The Poynter evaluator found CheckYourFact.com checked all the boxes of the IFCN. Poynter needs "impartiality and fairness, transparency of sources, funding and organization, methodology, and an open and honest correction policy" to be approved by the IFCN.
But as Carusone suggests, even though he technically checks the boxes, what does the Daily Caller do even in the area of fact checking? This is a website that has published apologists for Charlottesville, denial of climate changeand Pizzagate conspiracy theorists; it would be exceedingly generous not to question the reasons why they set up a control arm of the facts. If he was interested in the facts, why would he not already know, does he check the facts?
Facebook fact check program has been criticized as an ineffective public relations effort, not a robust solution to the problem of online spread of fake incendiaries. The inclusion of the fact-checking arm of The Daily Caller only adds to the perception that it's a way for Facebook to claim equity. and bi-politics, while allowing bad faith to take advantage of the system.
As with Facebook's fact-checking program as a whole, CheckYourFacts.com seems rather to be a way for The Daily Caller to erect a veneer of legitimacy. It also gives him a way to possibly have a say in what ranks up and down on Facebook.
"Facebook continues to be a victim of the work of the right-handed referees gambit and remains far too willing to do counter-productive, and sometimes even strange, things to relieve the critics of the right," Carusone said. "The systems remain inadequate and too susceptible to manipulation."
Most recently, the company has addressed misinformation with products such as , which allows you to check some video search results. Anti-vaccination videos, likely to harm the public, have been broadcast . YouTube even promised to answer his critics product, so that it ceases to actively promote extremist and conspiratorial content.
There is no doubt that YouTube takes platform security more seriously than ever before. YouTube will definitely be for the knowledge. However, a report from is now highlighting how his employees have consistently made YouTube aware of these issues long before he decides to solve them. And while YouTube highlights the focus on these issues over the last two years, such a rejected proposal could have helped to stifle the proliferation of shooting plots in Parkland last year.
According to former YouTube employees who spoke to Bloomberg, the company has been repeatedly warned of toxic content and misinformation about the service, but has dismissed concerns that they are focusing on growth. of the platform. In February 2018, YouTube employees proposed a solution to limited videos to legitimize sources of information in response to the conspiracy theory calling for Parkland's victims of crime to fire shots. According to Bloomberg, the proposal was rejected.
Some former high-level YouTube employees have even relied on the spread of this type of content to explain their departure from the company.
One of the first employees of YouTube, who worked there before Google The 2006 video site explained how the site moderated and demoted problematic videos, using content that promoted anorexia as an example. I pointed out how things seemed to have changed once.
With this desire to increase commitment and revenue, toxic videos have benefited from the changes. The problem became so well known that, according to Bloomberg, YouTube employees had a nickname for this brand of content: "bad virality".
Concerns about videos bypassing society's hate policies, the push for misinformation and promoted extremist content have been ignored. Proposed policy changes to address these issues were also rejected. The company went so far as to tell the staff, who are not part of the moderation teams, to stop looking for problematic content to report.
As YouTube notes in its response to the Bloomberg report, the company has begun to take these issues more seriously. YouTube was particularly toxic content concerning children. The company has even adopted policies similar to the proposal introduced following Parkland's .
The recent changes underway on YouTube are definitely a good thing for the future. But it is clear that so much could have been done sooner.
In total, the search giant has recorded 2.3 billion ads compared to last year. That's more than 6 million bad ads removed every day.
Google has also terminated its relationship with nearly one million incorrect advertiser accounts and nearly 734,000 publishers and app developers, nearly double the amount of 2017. The company has also removed ads from nearly 28 million web pages and 1.5 million apps.
In the report, Google details specific sectors and niches that require specific new policies in 2018. The company provides examples, such as for-profit bond providers, that advertise from its advertising network. Google says the decision has been made to prove that these providers are taking advantage of vulnerable communities. The company removed more than 531,000 listings for bonds last year.
In total, Google has created 31 new ad policies to stop malicious ads in problem areas related to: , resellers, third parties surety and drug treatment facilities. For example, the company now banned ads promoting drug treatment services unless the advertiser drug treatment provider. He also removed about 58.8 million ads for phishing scams from his network.
In terms of misinformation and the political sphere, Google claims to have verified 143,000 election ads in the United States, thanks to it took place last year. The company has removed ads of about 1.2 million pages, 22,000 apps, and 15,000 sites for violating the misinformation and hate content or poor quality policies.
Internet abounds with scam artists and malicious actors looking for targets. Google's action plan to deter advertisers and malicious editors from its networks is quite simple: remove their economic incentives. The company seems to be successful in removing advertising that violates its rules. Judging by Google's updated policies, the real challenge is to follow the evolution of the methods of these malicious actors.
The email application owned by Facebook is a new "search image" feature that allows users to easily upload an image found on their platform to Google, according to . The image tool has been discovered in the latest beta of WhatsApp for Android.
At the touch of a button, WhatsApp will send a photo of its application to the search giant. The email application will then direct users to a Google search results page displaying "similar or equivalent images" elsewhere on the web.
This information can help users to determine whether an image is real or whether it has been photographed, or even to know the background of an unmodified photo that has been stripped of its context.
With over Active users, WhatsApp is the most popular messaging platform in the world. This makes it a prime target for fake news. In India, where the app is very popular, the misinformation that has spread on WhatsApp has even turned .
WhatsApp has taken action over the past year to reduce the platform's false information issues. The company has developed an initiative focused on stopping misinformation about the service. A was created within the company to tackle fake news in India.
The messaging application itself has been the subject of many updates to combat the spread of false information. WhatsApp has released new for groups, imposed limits and proactively prohibits millions of each month.
With the new "search image" feature currently in beta, WhatsApp is now looking to combat fake information disseminated via photos.