Can Open Source defend against copyright claims for AI contributions?
If I submit code to ReactOS that was trained on leaked Microsoft Windows code, what are the legal implications?
what are the legal implications?
It would be so fucking nice if we could use AI to bypass copyright claims.
AI is at its most useful in the early stages of a project. Imagine coming to the fucking ssh project with AI slop thinking it has anything of value to add 😂
The early stages of a project is exactly where you should really think hard and long about what exactly you do want to achieve, what qualities you want the software to have, what are the detailed requirements, how you test them, and how the UI should look like. And from that, you derive the architecture.
AI is fucking useless at all of that.
In all complex planned activities, laying the right groundwork and foundations is essential for success. Software engineering is no different. You won’t order a bricklayer apprentice to draw the plan for a new house.
And if your difficulty is in lacking detailed knowledge of a programming language, it might be - depending on the case ! - the best approach to write a first prototype in a language you know well, so that your head is free to think about the concerns listed in paragraph 1.
the best approach to write a first prototype in a language you know well
Ok, writing a web browser in POSIX shell using yad now.
writing a web browser in POSIX shell
Not HTML but the much simpler Gemini protocol - well you could have a look at Bollux, a Gemini client written im shell, or at ereandel:
https://github.com/kr1sp1n/awesome-gemini?tab=readme-ov-file#terminal
Microsoft is doing this today. I can’t link it because I’m on mobile. It is in dotnet. It is not going well :)
Yeah, can’t find anything on dotnet getting poisoned by AI slop, so until you link it, I’ll assume you’re lying.
I guess they were referring to this.
OMG, this is gold! My neighbor must have wondered why I am laughing so hard…
The “reverse centaur” comment citing Cory Doctorow is so true it hurts - they want that people serve machines and not the other way around. That’s exactly how Amazon’s warehouses work with workers being paced by facory floor robots.
It’s not good because it has no context on what is correct or not. It’s constantly making up functions that don’t exist or attributing functions to packages that don’t exist. It’s often sloppy in its responses because the source code it parrots is some amalgamation of good coding and terrible coding. If you are using this for your production projects, you will likely not be knowledgeable when it breaks, it’ll likely have security flaws, and will likely have errors in it.
Have you used AI to code? You don’t say “hey, write this file” and then commit it as “AI Bot 123 aibot@company.com”.
You start writing a method and get auto-completes that are sometimes helpful. Or you ask the bot to write out an algorithm. Or to copy something and modify it 30 times.
You’re not exactly keeping track of everything the bots did.
yeah, that’s… one of the points in the article
I’ll admit I skimmed most of that train wreak of an article - I think it’s pretty generous saying that it had a point. It’s mostly recounts of people complaining about AI. But if they hid something in there about it being remarkably useful in cases but not writing entire applications or features then I guess I’m on board?
Well, sometimes I think the web is flooded with advertising an spam praising AI. For these companies, it makes perfect sense because billions of dollars has been spent at these companies and they are trying to cash in before the tides might turn.
But do you know what is puzzling (and you do have a point here)? Many posts that defend AI do not engage in logical argumentation but they argue beside the point, appeal to emotions or short-circuited argumentation that “new” always equals “better”, or claiming that AI is useful for coding as long as the code is not complex (compare that to the objection that mathematics is simple as long it is not complex, which is a red herring and a laughable argument). So, many thanks for you pointing out the above points and giving in few words a bunch of examples which underline that one has to think carefully about this topic!
The problem is that you really only see two sorts of articles.
AI is going to replace developers in 5 years!
AI sucks because it makes mistakes!
I actually see a lot more of the latter response on social media to the point where I’m developing a visceral response to the phrase “AI slop”.
Both stances are patently ridiculous though. AI cannot replace developers and it doesn’t need to be perfect to be useful. It turns out that it is a remarkably useful tool if you understand its limitations and use it in a reasonable way.
it’s a car that only explodes once in a blue moon!
No, it’s a car that breaks down once you go faster than 60km/h. It’s extremely useful if you know what you’re doing and use it only for tasks that it’s good at.
Hey @dgerard@awful.systems, care to weigh in on this “train wreak [sic] of an article?”
I asked Github Copilot and it added
import wreak
to .NET, so we’ll get back to you.
Or to copy something and modify it 30 times.
This seems like a very bad idea. I think we just need more lisp and less AI.
“Hey AI - Create a struct that matches this JSON document that I get from a REST service”
Bam, it’s done.
Or
"Hey AI - add a schema prefixed on all of the tables and insert statements in the SQL script.
Yeah integrating APIs has really become trivial with copilots. You just copy paste the documentation and all the boring stuff is done in the blink of an eye ! I love it
It’s exactly the sort of “tedious yet not difficult” task that I love it for. Sometimes you need to clean things up a bit but it does the majority of the work very nicely.
If humans are so good at coding, how come there are 8100000000 people and only 1500 are able to contribute to the Linux kernel?
I hypothesize that AI has average human coding skills.
The average coder is a junior, due to the explosive growth of the field (similar as in some fast-growing nations the average age is very young). Thus what is average is far below what good code is.
On top of that, good code cannot be automatically identified by algorithms. Some very good codebases might look like bad at a superficial level. For example the code base of LMDB is very diffetent from what common style guidelines suggest, but it is actually a masterpiece which is widely used. And vice versa, it is not difficult to make crappy code look pretty.
“Good code” is not well defined and your example shows this perfectly. LMDBs codebase is absolutely horrendous when your quality criterias for good code are Readability and Maintainability. But it’s a perfect masterpiece if your quality criteria are Performance and Efficiency.
Most modern Software should be written with the first two in mind, but for a DBMS, the latter are way more important.
Average drunk human coding skils
Well according to microsoft mildly drunk coders work better
My theory is not a lot of people like this AI crap. They just lean into it for the fear of being left behind. Now you all think it’s just gonna fail and it’s gonna go bankrupt. But a lot of ideas in America are subsidized. And they don’t work well, but they still go forward. It’ll be you, the taxpayer, that will be funding these stupid ideas that don’t work, that are hostile to our very well-being.
Ask Daniel Stenberg.
AI is just the lack of privacy, Authoritarian Dragnet, remote control over others computers, web scraping, The complete destruction of America’s art scene, The stupidfication of America and copyright infringement with a sprinkling of baby death.
who makes a contribution made by aibot514. noone. people use ai for open source contributions, but more in a ‘fix this bug’ way not in a fully automated contribution under the name ai123 way
Counter-argument: If AI code was good, the owners would create official accounts to create contributions to open source, because they would be openly demonstrating how well it does. Instead all we have is Microsoft employees being forced to use and fight with Copilot on GitHub, publicly demonstrating how terrible AI is at writing code unsupervised.
Yes, that’s exactly the point. AI is terrible at writing code unsupervised, but it’s amazing as a supportive tool for real devs!
Bingo
Bing. O.
Big O
Mostly closed source, because open source rarely accepts them as they are often just slop. Just assuming stuff here, I have no data.
To be fair if a competent dev used an ai “auto complete” tool to write their code, I’m not sure it’d be possible to detect those parts as an ai code.
I generally dislike those corporate AI tools but gave a try for copilot when writing some terraform script and it actually had good suggestions as much as bad ones. However if I didn’t know that well the language and the resources I was deploying, it’d probably have led me to deep hole trying to fix the mess after blindly accepting every suggestion
They do more than just autocomplete, even in autocomplete mode. These Ai tools suggest entire code blocks and logic and fill in multiple lines, compared to a standard autocomplete. And to use it as a standard autocomplete tool, no Ai is needed. Using it like that wouldn’t be bad anyway, so I have nothing against it.
The problems arise when the Ai takes away the thinking and brain functionality of the actual programmer. Plus you as a user get used to it and basically “addicted”. Independent thinking and programming without Ai will become harder and harder, if you use it for everything.
People seem to think that the development speed of any larger and more complex software depends on the speed the wizards vsn type in code.
Spoiler: This is not the case. Even if a project is a mere 50000 lines long, one is the solo developer, and one has a pretty good or even expert domain knowledge, one spends the mayor part of the time thinking, perhaps looking up documentation, or talking with people, and the key on the keyboard which is most used doesn’t need a Dvorak layout, bevause it is the “delete” key. In fact, you don’t need yo know touch-typing to be a good programmer, what you need is to think clearly and logically and be able to weight many different options by a variety of complex goals.
Which LLMs can’t.
I don’t think it makes writing code faster, just may reduce the number of key presses required
Creator of curl just made a rant about users submitting AI slop vulnerability reports. It has gotten so bad they will reject any report they deem AI slop.
So there’s some data.
And when they contribute to existing projects, their code quality is so bad, they get banned from creating more PRs.
As a dumb question from someone who doesn’t code, what if closed source organizations have different needs than open source projects?
Open source projects seem to hinge a lot more on incremental improvements and change only for the benefit of users. In contrast, closed source organizations seem to use code more to quickly develop a new product or change that justifies money. Maybe closed source organizations are more willing to accept slop code that is bad but can barely work versus open source which won’t?
Baldur Bjarnason (who hates AI slop) has posited precisely this:
My current theory is that the main difference between open source and closed source when it comes to the adoption of “AI” tools is that open source projects generally have to ship working code, whereas closed source only needs to ship code that runs.
That’s basically my question. If the standards of code are different, AI slop may be acceptable in one scenario but unacceptable in another.
Maybe closed source organizations are more willing to accept slop code that is bad but can barely work versus open source which won’t?
Because most software is internal to the organisation (therefore closed by definition) and never gets compared or used outside that organisation: Yes, I think that when that software barely works, it is taken as good enough and there’s no incentive to put more effort to improve it.
My past year (and more) of programming business-internal applications have been characterised by upper management imperatives to “use Generative AI, and we expect that to make you nerd faster” without any effort spent to figure out whether there is any net improvement in the result.
Certainly there’s no effort spent to determine whether it’s a net drain on our time and on the quality of the result. Which everyone on our teams can see is the case. But we are pressured to continue using it anyway.
I’d argue the two aren’t as different as you make them out to be. Both types of projects want a functional codebase, both have limited developer resources (communities need volunteers, business have a budget limit), and both can benefit greatly from the development process being sped up. Many development practices that are industry standard today started in the open source world (style guides and version control strategy to name two heavy hitters) and there’s been some bleed through from the other direction as well (tool juggernauts like Atlassian having new open source alternatives made directly in response)
No project is immune to bad code, there’s even a lot of bad code out there that was believed to be good at the time, it mostly worked, in retrospect we learn how bad it is, but no one wanted to fix it.
The end goals and proposes are for sure different between community passion projects and corporate financial driven projects. But the way you get there is more or less the same, and that’s the crux of the articles argument: Historically open source and closed source have done the same thing, so why is this one tool usage so wildly different?
Historically open source and closed source have done the same thing, so why is this one tool usage so wildly different?
Because, as noted by another replier, open source wants working code and closed source just want code that runs.
When did you last time decide to buy a car that barely drives?
And another thing, there are some tech companies that operate very short-term, like typical social media start-ups of which about 95% go bust within two years. But a lot of computing is very long term with code bases that are developed over many years.
The world only needs so many shopping list apps - and there exist enough of them that writing one is not profitable.
most software isn’t public-facing at all (neither open source nor closed source), it’s business-internal software (which runs a specific business and implements its business logic), so most of the people who are talking about coding with AI are also talking mainly about this kind of business-internal software.
Does business internal software need to be optimized?
Does business internal software need to be optimized?
Need to be optimised for what? (To optimise is always making trade-offs, reducing some property of the software in pursuit of some optimised ideal; what ideal are you referring to?)
And I’m not clear on how that question is related to the use of LLMs to generate code. Is there a connection you’re drawing between those?
So I was trying to make a statement that the developers of AI for coding may not have the high bar for quality and optimization that closed source developers would have, then was told that the major market was internal business code.
So, I asked, do companies need code that runs quickly on the systems that they are installed on to perform their function. For instance, can an unqualified programmer use AI code to build an internal corporate system rather than have to pay for a more qualified programmer’s time either as an internal hire or producing.
do companies need code that runs quickly on the systems that they are installed on to perform their function.
(Thank you, this indirectly answers one question: the specific optimisation you’re asking about, it seems, is optimised speed of execution when deployed in production. By stating that as the ideal to be optimised, necessarily other properties are secondary and can be worse than optimal.)
Some do pursue that ideal, yes. For example: many businesses seek to deploy their internal applications on hosted environments where they pay not for a machine instance, but for seconds of execution time. By doing this they pay only when the application happens to be running (on a third-party’s managed environment, who will charge them for the service). If they can optimise the run-time of their application for any particular task, they are paying less in hosting costs under such an agreement.
can an unqualified programmer use AI code to build an internal corporate system rather than have to pay for a more qualified programmer’s time either as an internal hire or producing.
This is a question now about paying for the time spent by people to develop and maintain the application, I think? Which is thoroughly different from the time the application spends running a task. Again, I don’t see clearly how “optimise the application for execution speed” is related to this question.
There are commercial open source stuff too
I created this entirely using mistral/codestral
https://github.com/suoko/gotosocial-webui
Not a real software, but it was done by instructing the ai about the basics of the mother app and the fediverse protocol
I think it’s established genAI can spit straightforward toy examples of a few hundred lines. Bungalows aren’t simply big birdhouses though.
Still they’re just birdhouses with some more infrastructure you can read instructions about how to build it.