• 1 Post
  • 33 Comments
Joined 1 year ago
cake
Cake day: May 8th, 2023

help-circle
  • A1kmm@lemmy.amxl.comtoPrivacy@lemmy.ml*Permanently Deleted*
    link
    fedilink
    English
    arrow-up
    4
    ·
    19 days ago

    When people say Local AI, they mean things like the Free / Open Source Ollama (https://github.com/ollama/ollama/), which you can read the source code for and check it doesn’t have anything to phone home, and you can completely control when and if you upgrade it. If you don’t like something in the code base, you can also fork it and start your own version. The actual models (e.g. Mistral is a popular one) used with Ollama are commonly represented in GGML format, which doesn’t even carry executable code - only massive multi-dimensional arrays of numbers (tensors) that represent the parameters of the LLM.

    Now not trusting that the output is correct is reasonable. But in terms of trusting the software not to spy on you when it is FOSS, it would be no different to whether you trust other FOSS software not to spy on you (e.g. the Linux kernel, etc…). Now that is a risk to an extent if there is an xz style attack on a code base, but I don’t think the risks are materially different for ‘AI’ compared to any other software.


  • Blockchain is great for when you need global consensus on the ordering of events (e.g. Alice gave all her 5 ETH to Bob first, so a later transaction to give 5 ETH to Charlie is invalid). It is an unnecessarily expensive solution just for archival, since it necessitates storing the data on every node forever.

    Ethereum charges ‘gas’ fees per transaction which helps ensure it doesn’t collapse under the weight of excess usage. Blocks have transaction limits, and transactions have size limits. It is currently working out at about US$7,500 per MB of block data (which is stored forever, and replicated to every node in the network). The Internet Archive have apparently ~50 PB of data, which would cost US$371 trillion to put onto Ethereum (in practice, attempting this would push up the price of ETH further, and if they succeeded, most nodes would not be able to keep up with the network). Really, this is just telling us that blockchain is not appropriate for that use case, and the designers of real world blockchains have created mechanisms to make it financially unviable to attempt at that scale, because it would effectively destroy the ability to operate nodes.

    The only real reason to use an existing blockchain anyway would be on the theory that you could argue it is too big to fail due to legitimate business use cases, and too hard to remove censorship resistant data. However, if it became used in the majority for censorship resistant data sharing, and transactions were the minority, I doubt that this would stop authorities going after node operators and so on.

    The real problems that an archival project faces are:

    • The cost of storing and retrieving large amounts of data. That could be decentralised using a solution where not all data is stored on a chain - for example, IPFS.
    • The problem of curating data and deciding what is worth archiving, and what is a true-to-source archive vs fake copy. This probably requires either a centralised trusted party, or maybe a voting system.
    • The problem of censorship. Anonymity and opaqueness about what is on a particular node can help - but they might in some cases undermine the other goals of archival.

  • A1kmm@lemmy.amxl.comtoPrivacy@lemmy.mlInternet Archive is in danger
    link
    fedilink
    English
    arrow-up
    14
    arrow-down
    2
    ·
    28 days ago

    This is absolutely because they pulled the emergency library stunt, and they were loud as hell about it. They literally broke the law and shouted about it.

    I think that you are right as to why the publishers picked them specifically to go after in the first place. I don’t think they should have done the “emergency library”.

    That said, the publishers arguments show they have an anti-library agenda that goes beyond just the emergency library.

    Libraries are allowed to scan/digitize books they own physically. They are only allowed to lend out as many as they physically own though. Archive knew this and allowed infinite “lend outs”. They even openly acknowledged that this was against the law in their announcement post when they did this.

    The trouble is that the publishers are not just going after them for infinite lend-outs. The publishers are arguing that they shouldn’t be allowed to lend out any digital copies of a book they’ve scanned from a physical copy, even if they lock away the corresponding numbers of physical copies.

    Worse, they got a court to agree with them on that, which is where the appeal comes in.

    The publishers want it to be that physical copies can only be lent out as physical copies, and for digital copies the libraries have to purchase a subscription for a set number of library patrons and concurrent borrows, specifically for digital lending, and with a finite life. This is all about growing publisher revenue. The publishers are not stopping at saying the number of digital copies lent must be less than or equal to the number of physical copies, and are going after archive.org for their entire digital library programme.


  • A1kmm@lemmy.amxl.comtoAsklemmy@lemmy.mlAre you a 'tankie'
    link
    fedilink
    English
    arrow-up
    6
    arrow-down
    3
    ·
    28 days ago

    No

    On economic policy I am quite far left - I support a low Gini coefficient, achieved through a mixed economy, but with state provided options (with no ‘think of the businesses’ pricing strategy) for the essentials and state owned options for natural monopolies / utilities / media.

    But on social policy, I support social liberties and democracy. I believe the government should intervene, with force if needed, to protect the rights of others from interference by others (including rights to bodily safety and autonomy, not to be discriminated against, the right to a clean and healthy environment, and the right not to be exploited or misled by profiteers) and to redistribute wealth from those with a surplus to those in need / to fund the legitimate functions of the state. Outside of that, people should have social and political liberties.

    I consider being a ‘tankie’ to require both the leftist aspect (✅) and the authoritarian aspect (❌), so I don’t meet the definition.


  • The best option is to run them models locally. You’ll need a good enough GPU - I have an RTX 3060 with 12 GB of VRAM, which is enough to do a lot of local AI work.

    I use Ollama, and my favourite model to use with it is Mistral-7b-Instruct. It’s a 7 billion parameter model optimised for instruction following, but usable with 4 bit quantisation, so the model takes about 4 GB of storage.

    You can run it from the command line rather than a web interface - run the container for the server, and then something like docker exec -it ollama ollama run mistral, giving a command line interface. The model performs pretty well; not quite as well on some tasks as GPT-4, but also not brain-damaged from attempts to censor it.

    By default it keeps a local history, but you can turn that off.


  • The government just has to print for the money, and use it for that

    Printing money means taxing those that have cash or assets valued directly in the units of the currency being measured. Those who mostly hold other assets (say, for example, the means of production, or land / buildings, or indirect equivalents of those, such as stock) are unaffected. This makes printing money a tax that disproportionately affects the poor.

    What the government really needs to do is tax the rich. Many top one percenters of income fight that, and unfortunately despite the democratic principle of one person, one vote, in practice the one percenters find ways to capture the government in many countries (through their lobbying access, control of the media, exploitation of weaknesses of the electoral system such as non-proportional voting and gerrymandering).

    instead of bailing out the capitalists over and over.

    Bailing out large enterprises that are valuable to the public is fine, as long as the shareholders don’t get rewarded for investing in a mismanaged but ‘too big to fail’ business (i.e. they lose most of their investment), and the end result is that the public own it, and put in competent management who act in the public interest. Over time, the public could pay forward previous generations investments, and eventually the public would own a huge suite of public services.


  • Yes, but the information would need to be computationally verifiable for it to be meaningful - which basically means there is a chain of signatures and/or hashes leading back to a publicly known public key.

    One of the seminal early papers on zero-knowledge cryptography, from 2001, by Rivest, Shamir and Tauman (two of the three letters in RSA!), actually used leaking secrets as the main example of an application of Ring Signatures: https://link.springer.com/chapter/10.1007/3-540-45682-1_32. Ring Signatures work as follows: there are n RSA public keys of members of a group known to the public (or the journalist). You want to prove that you have the private key corresponding to one of the public keys, without revealing which one. So you sign a message using a ring signature over the ‘ring’ made up of the n public keys, which only requires one of n private keys. The journalist (or anyone else receiving the secret) can verify the signature, but obtain zero knowledge over which private key out of the n was used.

    However, the conditions for this might not exist. With more modern schemes, like zk-STARKs, more advanced things are possible. For example, emails these days are signed by mail servers with DKIM. Perhaps the leaker wants to prove to the journalist that they are authorised to send emails through the Boeing’s staff-only mail server, without allowing the journalist, even collaborating with Boeing, to identify which Boeing staff member did the leak. The journalist could provide the leaker with a large random number r1, and the leaker could come up with a secret large random number r2. The leaker computes a hash H(r1, r2), and encodes that hash in a pattern of space counts between full stops (e.g. “This is a sentence. I wrote this sentence.” encodes 3, 4 - the encoding would need to limit sentence sizes to allow encoding the hash while looking relatively natural), and sends a message that happens to contain that encoded hash - including to somewhere where it comes back to them. Boeing’s mail servers sign the message with DKIM - but leaking that message would obviously identify the leaker. So the leaker uses zk-STARKs to prove that there exists a message m that includes a valid DKIM signature that verifies to Boeing’s DKIM private key, and a random number r2, such that m contains the encoded form of the hash with r1 and r2. r1 or m are not revealed (that’s the zero-knowledge part). The proof might also need to prove the encoded hash occurred before “wrote:” in the body of the message to prevent an imposter tricking a real Boeing staff member including the encoded hash in a reply. Boeing and the journalist wouldn’t know r2, so would struggle to find a message with the hash (which they don’t know) in it - they might try to use statistical analysis to find messages with unusual distributions of number of spaces per sentence if the distribution forced by the encoding is too unusual.



  • A1kmm@lemmy.amxl.comtoLinux@lemmy.mlopen letter to the NixOS foundation
    link
    fedilink
    English
    arrow-up
    70
    arrow-down
    21
    ·
    2 months ago

    I wonder if this is social engineering along the same vein as the xz takeover? I see a few structural similarities:

    • A lot of pressure being put on a maintainer for reasons that are not particularly obvious what they are all about to an external observer.
    • Anonymous source other than calling themselves KA - so that it can’t be linked to them as a past contributor / it is not possible to find people who actually know the instigator. In the xz case, a whole lot of anonymous personas showed up to put the maintainer under pressure.
    • A major plank of this seems to be attacking a maintainer for “Avoiding giving away authority”. In the xz attack, the attacker sought to get more access and created astroturfed pressure to achieve that ends.
    • It is on a specially allocated domain with full WHOIS privacy, hosted on GitHub on an org with hidden project owners.

    My advice to those attacked here is to keep up the good work on Nix and NixOS, and don’t give in to what could be social engineering trying to manipulate you into acting against the community’s interests.


  • Most of mine are variations of getting confused about what system / device is which:

    • Had two magnetic HDDs connected as my root partitions in RAID-1. One of the drives started getting SATA errors (couldn’t write), so I powered down and disconnected what I thought was the bad disk. Reboot, lots of errors from fsck on boot up, including lots about inodes getting connected to /lost+found. I should have realised at that point that it was a bad idea to rebuild the other good drive from that one. Instead, I ended up restoring from my (fortunately very recent!) backup.
    • I once typed sudo pm-suspend on my laptop because I had an important presentation coming up, and wanted to keep my battery charged. I later noticed my laptop was running low on power (so rushed to find power to charge it), and also that I needed a file from home I’d forgotten to grab. Turns out I was actually in a ssh terminal connected to my home computer that I’d accidentally suspended! This sort of thing is so common that there is a package in some distros (e.g. Debian) called molly-guard specifically to prevent that - I highly recommend it and install it now.
    • I also once thought I was sending a command to a local testing VM, while wiping a database directory for re-installation. Turns out, I typed it in the wrong terminal and sent it to a dev prod environment (i.e. actively used by developers as part of their daily workflow), and we had to scramble to restore it from backup, meanwhile no one could deploy anything.

  • I suggest having a threat model about what attack(s) your security is protecting against.

    I’d suggest this probably isn’t giving much extra security over a long unique password for your password manager:

    • A remote attacker who doesn’t control your machine, but is trying to phish you will succeed the same - dependent on your practices and password manager to prevent copying text.
    • A remote attacker who does control your machine will also not be affected. Once your password database in the password manager is decrypted, they can take the whole thing, whether or not you used a password or hardware key to decrypt it. The only difference is maybe they need slightly more technical skill than copying the file + using a keylogger - but the biggest threats probably automate this anyway and there is no material difference.
    • A local attacker who makes a single entry to steal your hardware, and then tries to extract data from it, is either advantaged by having a hardware key (if they can steal it, and you don’t also use a password), or is in a neutral position (can’t crack the locked password safe protected by password, don’t have the hardware key / can’t bypass its physical security). It might be an advantage if you can physically protect your hardware key (e.g. take it with you, and your threat model is people who take the database while you are away from it), if you can’t remember a sufficiently unique passphrase.
    • A local attacker who can make a surreptitious entry, and then come back later for the results is in basically the same position as a remote attacker who does control your machine after the first visit.

    That said, it might be able to give you more convenience at the expense of slightly less security - particularly if your threat model is entirely around remote attackers - on the convenience/security trade-off. You would touch a button to decrypt instead of entering a long passphrase.



  • As if telling Reddit, Facebook, or Google what to put on their roadmap as an ordinary consumer would actually work.

    At least with FLOSS if you want something, and if it is a good thing the developers like, you can likely get it merged. If not, you can fork and still have the feature locally. Good luck getting that freedom with a closed-source product.

    For software I develop, I do find it is helpful if people making feature suggestions tell developers what is useful for them and why, but that doesn’t entitle them any of my time to demand what features I prioritise. The alternative is “I gave you something you like for free, so now I owe you to make it even better for you”, which is obviously nonsense.


  • I thought the orbs were supposedly open source

    No they are proprietary as a whole. Parts of the hardware design are published, and parts of the software that runs on them, but not the whole thing.

    Fundamentally Worldcoin is about ‘one person, one vote’, and anyone can create millions of fake iris images; the point of the orb is that it is ‘blessed’ hardware using trusted computing (or to use the term coined by the FSF, treacherous computing) and tamper detection to make sure that a central authority (namely Sam Altman’s Worldcoin foundation) has signed off on the orb running the exact secret / proprietary software running on the orb that generates an identity.

    They could have alternatively have built a system that leverages government identity using zero-knowledge proof of possession of a government-signed digital identity document. But I think their fundamental thesis is that they are trustworthy to be a central authority who could create millions of fake identities if they wanted, but that governments are not.


  • One of the key tenets of keeping something computerised secure is ‘Defence in Depth’ - i.e. having multiple layers of defence, so that even if one layer is breached, the next layer (which you thought was redundant and unnecessary) prevents the attack.

    Running a fully patched kernel and services / applications should protect you unless someone has a 0-day (i.e. not disclosed) exploit. Reducing the surface area by minimising what services / applications are running, using software (firejail etc…) and firewalls to limit permissions of applications / services to what is needed, etc… serves as another layer of defence. Disconnecting or physically blocking peripherals that might allow for spying is another layer; it serves its purpose if all the other layers are breached.


  • I think doing a good analysis of strategy here will depend on a lot of factors.

    Firstly, before coming up with a strategy, it is good to have a clear idea of your goals / the strategic problem you are trying to solve. I see or could infer a few possible ones: you want to work in an environment where you don’t feel bullied, you want to ensure others aren’t bullied, you want to see bullies punished, to maintain positives in the company and want to enjoy those without the negatives of being bullied, or perhaps that you believe in the goals of the company or have some stake in it, and want it to succeed. Different goals might lead you to a different course of action.

    Next, you would want to diagnose what’s really going on. Are there just a few bullies, in a company mostly full of professional people, or are the bullies the majority? Are senior leaders in on the bullying, or is it only lower level employees? Why do you think the bullies were hired in the first place - is it because bullying is considered okay in the company, or is it not considered okay but they slipped through? Why do you think the bullying hasn’t been addressed already? Is it because senior managers don’t know? Are the bullies friends / relatives of senior leadership? Are the bullies high performers that the company really would want to keep around, or do they get barely get anything done? Also, are the bullies even aware they are being bullies? Are they unaware they are being insensitive, and likely to change if made aware, or are they actively being malicious and well aware of the impact?

    Next, consider the direction you want to take, and analyse the likely impact on your goals. You could find another job - how easy that is would depend what the job market looks like for your role, and how good the terms of your current job are. It wouldn’t achieve goals around making it better for others. You could try talking to the bullies if you think that they might just be unaware of the impact of their behaviour and that they might change. If that doesn’t work, you could try talking to a manager / HR member, perhaps either to arrange mediation, or for them to take action. You could also just try ignoring the bullying if it isn’t having much impact.

    To choose from the many possible directions, it might help to think from the perspective of the company shareholders, senior leadership, and HR department. What would you do in their shoes if you learned of the bullying? If it is the majority of the company doing the bullying, then something like replacing all the bullying staff is going to be an instant non-starter. The best possible would be to slowly roll out training, policies, and new hiring practices to try to improve the culture over time. If it is a few people who, it now turns out, are the reason for high staff turnover and lower profits, then they might be quite happy to take action. Although probably not if the bullies are the senior leaders.


  • There are a few different types of blockchain, differing by how they stop you just making up your own alternative chain and saying that is the real history:

    • Proof of Work - prove you wasted lots of energy to add to the chain, making it prohibitively expensive to make your fake alternative chain - but also causing lots of emissions / wasting lots of energy.
    • Proof of Stake - adding to the chain requires participation of the people with the most total coins in the cryptocurrency already. Essentially ‘one dollar, one vote’, and ‘the rich get richer’ brought to crypo.
    • Proof of Humanity / Proof of Personhood / Proof of Identity - adding to the chain requires the participation of the most people. Attempts to bring “One person, one vote”, and Universal Basic Income to crypto. There are various attempts - some require submission of photos and videos, and have an adjudication scheme built in to detect duplicates (which might fall to AI-generated faces relatively soon). Others (see Worldcoin) require a trusted central party to produce hardware which scans faces and verifies they are real and unique (and have already had data leaks from participants involved in verification). The other option is to trust governments / other existing infrastructure to verify identities (which is probably the most sensible option if you are trying to genuinely just disrupt banking, but many crypto people hate because they also have a cyberpunk fantasy of accelerating crypto-anarchy, and actually want crypto to be used for tax evasion and without the cooperation of governments).

    So there are alternatives to environmental impact, but there is currently no perfect crypto. Stack that on top of the number of scammers out there riding the crypto buzz, and it is certainly not that hard to see the reasons behind the hate.


  • A lot of AI models are probably already trained on Reddit data. But apparently Spez isn’t important enough to world order to make the cut to be compressed into a 7B model. I asked my Mistral-7B-Instruct (4-bit quantised) local LLM:

    >>> What can you tell me about someone called u/spez, a
    ... nd are they a nice person or horrible?
     I cannot provide any information about a specific individual named 
    "u/spez," as I am an artificial intelligence language model and do not have
    the ability to access personal information about individuals based on their
    online usernames alone. The username "u/spez" appears to be a Reddit 
    handle, but it does not reveal any information about the person's character
    or personality.
    
    Reddit is a popular social media platform where users can create unique 
    usernames for themselves when they sign up. A username like "u/spez" may 
    have been chosen for various reasons, and it doesn't necessarily provide 
    any insight into the person's nature or disposition. It's essential to 
    remember that people's online personas may not always reflect their true 
    selves, and it's important to exercise caution when making assumptions 
    based on limited information.
    
    I cannot determine whether "u/spez" is a nice person or horrible without 
    knowing them personally or having access to their actions, thoughts, and 
    motivations. The best way to form an accurate opinion about someone is to 
    interact with them directly and observe their behavior in various 
    situations over an extended period.
    
    >>> Is it possible u/spez is Steve Huffman? Can you tell me anything based on tha
    ... t?
     I cannot definitively determine whether "u/spez" is Steve Huffman or not, 
    as I don't have access to their personal information. Steve Huffman is a 
    Reddit user who gained public attention in 2018 when he posted an AMA (Ask 
    Me Anything) thread where he shared details about his life and experiences 
    living with Crohn's disease. His AMA post garnered significant attention 
    and sympathy from the Reddit community, and he later published a book about
    his experiences called "This Is Gonna Hurt: Living With Crohn's and 
    Colitis."
    
    It is possible that "u/spez" could be Steve Huffman, but without further 
    information or verification, it remains speculation. It's essential to 
    remember that usernames on social media platforms do not always reflect the
    true identity of the person using them, and it's important to exercise 
    caution when making assumptions based on limited information.
    
    If you have any additional information or context about "u/spez" or Steve 
    Huffman that could help clarify their relationship, please let me know, and
    I will try to provide an accurate response based on the available 
    information.
    

  • Programming is the most automated career in history. Functions / subroutines allow one to just reference the function instead of repeating it. Grace Hopper wrote the first compiler in 1951; compilers, assemblers, and linkers automate creating machine code. Macros, higher level languages, garbage collectors, type checkers, linters, editors, IDEs, debuggers, code generators, build systems, CI systems, test suite runners, deployment and orchestration tools etc… all automate programming and programming-adjacent tasks, and this has been going on for at least 70 years.

    Programming today would be very different if we still had to wire up ROM or something like that, and even if the entire world population worked as programmers without any automation, we still wouldn’t achieve as much as we do with the current programmer population + automation. So it is fair to say automation is widely used in software engineering, and greatly decreases the market for programmers relative to what it would take to achieve the same thing without automation. Programming is also far easier than if there was no automation.

    However, there are more programmers than ever. It is because programming is getting easier, and automation decreases the cost of doing things and makes new things feasible. The world’s demand for software functionality constantly grows.

    Now, LLMs are driving the next wave of automation to the world’s most automated profession. However, progress is still slow - without building massive very energy expensive models, outputs often need a lot of manual human-in-the-loop work; they are great as a typing assist to predict the next few tokens, and sometimes to spit out a common function that you might otherwise have been able to get from a library. They can often answer questions about code, quickly find things, and help you find the name of a function you know exists but can’t remember the exact name for. And they can do simple tasks that involve translating from well-specified natural language into code. But in practice, trying to use them for big complicated tasks is currently often slower than just doing it without LLM assistance.

    LLMs might improve, but probably not so fast that it is a step change; it will be a continuation of the same trends that have been going for 70+ years. Programming will get easier, there will be more programmers (even if they aren’t called that) using tools including LLMs, and software will continue to get more advanced, as demand for more advanced features increases.


  • If you have control of the domain, you can also get an X.509 certificate from any CA (e.g. for free from LetsEncrypt). Then you can put up a new server on that domain with a valid cert. If that server supports ActivityPub, it can provide new public keys for private keys you control for all users on the server, and can use the corresponding private keys to sign messages from any user on that server to any community those users are still subscribed to. In addition, any users on other servers still posting to / interacting with communities on that server would cause their server to send that to the inbox on the new server.

    This means any usernames or communities on queer.af should no longer be trusted.