CyLab Conference Notes

33 minute read

Date:

These are the notes that I made during the CyLab 2024 conference held at Carnegie Mellon University that I attended in September.

My lab has been working in various projects on security, and I personally have been involved in projects on anti-phishing training, security deception, and network security analysis. This was a great experience and gave me the opportunity to learn from researchers outside of the area of human behavior on the work that they are doing.

As with my past blog posts on conferences that have included my notes, the comments and descriptions are my own interpretation of the presentations given by the researchers below and may not necessarily reflect the thoughts or opinions of those researchers. I have tried to add some relevant links to recent research by these presenters to give more context to their work.

Day 1: Tuesday, September 24, 2024

Session I: Partnerships and Initiatives

Future Enterprise Security Initiative : Vyas Sekar and Lujo Bauer

Enterprises have complex ecosystems with third party dependencies and depend on remote work. As a result massive breaches have become more common. What makes some companies unique in their security ability? Custom solutions, large security teams. How to extend this security to small to medium enterprises that rely on third party solutions.

Collaboration between organizations can produce a global threat landscape that cant be achieved by small to medium companies alone.

Foundations of understanding cyber risk are always relevant. AI driven workflows can automate solutions and minimize the need for large human teams. Use cases of GAI and risks caused by its applications by bad actors.

Question from the audience: How can startups get. incorporated into work being done in security. Answers: Can be useful to go directly to students and talk to them about what would be useful to implement in the code bases since they are the ones that actually do all the work of writing the code.

CMU Secure Blockchain Initiative : Nicolas Christin

Secure blockchains that don’t focus too much on the hype of crypto but investigate what can be beneficial in blockchain technology. Using blockchains as a vector of deep fundamental research. Initial blockchain innovations were applied onto crypography research. Want to find deep research that is more applicable to other research areas.

Something that is important to learn in crypto zk-snarks. Crtyptoeconomics is interested in consensus and verification in crypto and has applications in economics and security.

Applications and implementations are somewhat of a CBDC (cryptos that are developed by governments). Stable coins that are pegged to the USD. What even is programming for blockchains, smart contracts are nice but playing with money can be dangerous, want guarantees.

Crypto centralized and decentralized, ETFs can be risky.

CyLab-Africa: Assane Gueye

Upanzi digital public infrastructure network: Digitization of Africa is crucial, and moving into the fourth industrial revolution. Many opportunitiess in development but high risks in terms of security of underdeveloped areas. Ensuring that technologies are human-centric and coordination in the technologies and regulation of African countries.

Development in Africa through digital public goods and infrastructures designed to fulfill specific needs of individuals. Examples of DPGI, digital IDs, cross border interoperability, determining the reality of countries ability to implement services like financial interoperability while keeping in conformation with laws and regulations.

Question from the audience: how do you think about the choice aspect of security, in how people determine their own security needs. Answer: A critical aspect of this is educating the public so that they can understand the security decisions that they make.

A National Initiative for Cybersecurity Advancement: Greg Touhill

Software engineering institute is developing a national initiative for cybersecurity that serves the US government and DoD on security.

Many people involved in the development of AI systems and online systems do not have experience in secure development or necessarily computer science in general. Need to develop a framework for the design of secure systems that takes this into mind.

Public awareness of cybersecurity, the technical and human security risks. Usability in terms of the time it takes to train someone to implement a security system is massive and results in a high cost to the public. Consilience between the offline systems that are digital but unconnected to other systems.

CyLab Venture Network : Michael Lisanti

CyLab Partnership Program Updates : Jason Griess

Discussion of start-up and other relationships with companies in cybersecurity spaces.

Alessandro Aquisti : Privacy Regulation and Behavioral Advertising

Impacts of differential privacy on social science research. Privacy preserving analytics. Federated learning techniques, could be used as an alternative to third party online cookies for targeted advertisement. Example is the government producing information about the census that adds random noise to improve privacy.

Issue with this approach is that it relies on adjusting real data and can make the conclusions of researchers in social science impacted.

100 social science research studies and add differentially private noise into the dataset to see if this changes the results of the original paper. 25% of the papers have been analyzed. Looking for epistemic parity meaning the published findings match the findings and the claims of those. Strict parity is exact percentage claims between the two.

After noise application, (smaller epsilon means more pretection, preferably it is 0.1 or below for good privacy) 50% of the results go away after applying this noise.

Question from me(Could this issue really be the result of p-hacking? From the authors of the original paper)

Answer: Yes and this could be a method for analyzing not the presence of p hacking but the robustness of research results.

Shuli Jiang : Differentially Private Incremental Gradient (IG) Methods with Public Data

PABI: Private risk minimization, ideally we want to reduce privacy loss throughout the training of models. Hiding the intermediate model parameters during training can potentially increase privacy, isn’t that typically what is done? We can leverage this aspect of NN training to make an order of samples of training to add public data at the end with noise to protect the sensitive samples.

Question of how. PABI impacts the optimization of training. There is a tradeoff between privacy and convergence when training with private training datasets, You might converge to a different solution if the requirements on privacy are strict. This project compared different orderings of private and public dataset to improve privacy.

Hank Lee : Privacy in the age of AI: What has changed and what should we do about it?

Helping end users that design AI tools and technologies allow for improved security. This can be difficult as AI practitioners may not even have AI experience let alone AI security research. Dataset of 321 instances of AI privacy risk security failures. 92% of instances were either exacerbated or created by the AI technology (what is the breakdown of exacerbation or creation and how was that determined?)

One risk of trying to increase security is the increase in the dissemination of false or misleading information about people. AI is enabling pseudoscientific phrenology and physiognomy by making predictions of people’s personalities, propensity to commit crime, etc.

Many issues with AI privacy noted from practitioners of AI in studies of practitioners, only half ere interested in AI privacy and it was mostly regarding non-AI related standard practices and regulations. This information was used to develop a tool to better understand AI security risks. Currently doing a user study to see how this website performs.

Norman Sadeh : Users First: A User-Centric Privacy Threat Modeling Framework for Notice and Choice; Update on the Privacy Engineering Initiative

Privacy choice. Need more consistency across different methods for developing privacy notices, choice selection procedures of privacy options.

Developed a new set of privacy notices and issue classifications that was user-centric to allow people to understand when their privacy is being violated or when they are giving up privacy rights. ‘

Compared this new set of privacy notices to the existing LINDDUN and apparently the participants had worse performance after reading the LINDDUN description.

My question: Orange and blue graph that showed a decrease in the number of threats identified by people after the read the LINDDUN description, but in that case that description was irrelevant and the participants would likely assume that the task is related to the information. I don’t think these results are valid because of this.

Moderated Q&A session:

Is privacy getting worse or better? Norman said no because it is an arms race and there is always more and more data being collected. Allesandro didn’t seem as concerned. Hank highlighted the importance of being aware of privacy issues, if we aren’t getting better at it we are at least being more aware of the issues. Shuli says that from the AI privacy perspective they are getting better and better year over year in terms of privacy and optimization in theory, but industry applications are not.

How do we incentivize companies to conform to regulations or improve the quality of privacy when in reality many companies do not conform to these. Norman highlighted the ability to sue companies that lose privacy, or having the companies pay for breaches. Encouraging competativeness amongst private companies to make their abilities in privacy better than other companies. Research in the economics of private security.

Rahul Telang : Merchants of Vulnerabilities: How Bug Bounty Programs Benefit Software Vendors

Testing for software security is costly and slows the time to development. There is competition among competitors to get things out faster, and because software can be patched there isn’t as much of a concern to get things to be perfect initially.

The idea of public disclosing without a warning to the vendor was called ‘information anarchy’ where there were public lists of open vulnerabilities. Now we encourage the waiting period. After this development came the bug bounty system where we are essentially leveraging gig workers to do the security analysis for you.

This is hard to study since the private bug bounty program doesn’t release information about whether bugs are found, what benefits there are to the company for hiring the bounty hunters privately.

Quesiton: How does the incentive program work for private bug bounty hunters, could there be any benefit to them not reporting a bug they find and pretending they didn’t see it?

Question: without context it seems like the bug bounty program could only be beneficial, is the concern that companies with bug bounty programs are encouraged to release software before it is fully tested because they know they can use the bug bounty program and release a more buggy problem.

Question: Does open response content count as personally identifiable information if you could build a system that determines someone’s public online identity based on the similarity of their open responses.

Elijah Bouma-Sims : The Kids Are All Right: Investigating the Susceptibility of Teens and Adults to YouTube Giveaway Scams

Examples of youtube giveaway scams, mr beast getting kids to buy his over priced chocolate bars so they can potentially win a monetary prize. How do people interact with these, do teens and adults interact differently in these settings? (I wonder if the IRB approval for this was difficult).

Human behavior experiment studying whether people would recommend a video that offers a Spotify or Roblox free trial. only 2-17% of people thought the video was a scam. Teens didn’t perform significantly better or worse. (question: was the person in the video famous or have any sort of following?) Teens much more likely to suggest reporting the video as being a scam (22 vs 4%).

McKenna McCall : “Trust me bro your medical records are totally safe”: Explaining Trusted Execution Environments

TEEs keep code running in the cloud secure from other people running things on the cloud. How do we best explain to end users these types of technologies unless they can understand what the benefits are.

12 conditions were constructed by varying introduction (hardware trust or none), the information (technical or non technical) and prevention (described or not). Example questions were asked of participants about the abilities of TEEs, but they seemed all like positive aspects of TEEs. There was no effect on trust or likelihood of using them.

Question: Given that TEEs and no security can even be perfect is there a concern that one of the 12 explanations may result in over-trusting of TEEs. Did any of the 12 questions that were correct or incorrect have to do with over confidence? ‘while privacy is never perfect…’ qualification may be better. I think this is a big issue since the advertising around the importance of VPNs is often overstated.

Reviewing privacy reviews of applications on google play. 12 million reviews were about privacy and these were analyzed to produce high level themes and the emotions related with these comments.

Analysis showed an increase in comments about privacy concerns over time, but decreases in specific concerns. Some countries are increasing, some are decreasing, and some are staying the same in the level of complaints that are being made.

Cluster analysis showed that some clusters that were close to each other ended up in similar clusters. (I think this could just be the result of specific apps that are commonly used by people in those countries becoming more popular or having issues with apps that are described in the news etc.)

Question: There appears to be peaks of 100% of reviews being about privacy conerns, is this possible or a sign that the method of categorizing reviews as being about privacy is not perfect.

Question from the audience: to what extend do privacy regulations impact peoples complaints about privacy? Answer: It wasn’t investigated but it likely is true.

Lujo Bauer : Toward better definitions and metrics for threats against ML systems

Noise injection attacks, how serious are they in reality? How do practical applications being used in the real world need to concern themselves with these potential issues. These can be applied onto face recognition using GAI to produce unique looking glasses that fooled detection systems into mislabeling you. This can also be done to fool malware detection NN software, but adversarial training can improve the susceptibility to this type of attack, and it still requires very long training because producing the malware that passes the filter takes a very long time to make. Next step is to not fully train on the real adversarial examples but close enough approximations that end up improving robustness to adversarial attacks.

One consideration of adversarial attacks in ML is that there is a limited range of the types of classification changes that actually benefit attackers. Meanwhile, previous definitions of robustness were much more broad, and thus the metrics weren’t as helpful as they could be. They designed new methods to assess the security concerns associated with adversarial attacks. After this they showed that there can be better ways to evaluate the robustness of ML models to adversarial attacks.

Virginia Smith : Rethinking Approaches for LLM Safety

Failures of LLMs and improving safety concerns with federate learning. Structure of LLM training is pre-training, alignment, and fine tuning. Concerns from safety are the production of copy written data, private data, or harmful responses. Most of the current methods of improving safety are after the fact to unlearn dangerous information or adding watermarking through fine-tuning to make generated data clear.

These fine-tuning approaches are fragile since adding some contextual information (like covid data descriptions) can jog the memory that you had thought was gone. Another option is prompting, by telling the LLM to forget something specific like ‘forget that the harry potter series exists). Watermarking is also relatively easy to attack (how?) to remove the watermark.

This research seeks to incorporate federated learning into foundations models research (is it to improve safety?).

Giulia Fanti : Machine learning from private data in the age of LLMs

Federated learning with differential privacy, central model is trained on the private data of multiple users, and a separate model is trained locally on private data while sending the noisy model updates back up to the main model. An issue with this is that models are now too big to be trained on devices even a small amount, or even be held on those.

Solution to this is producing synthetic data that is similar to the data that is being produced by the user, but not totally the same, which is then added to the centralized dataset. Local models determine which synthetic data is the best fit to the data that is being produced privately, and that noisy information is then sent to the large model dataset.

This outperforms federated learning approaches that have privacy predilections, similar performance to FL methods that do not have any privacy guarantees. This approach also has massive reductions in communication 100x and client computations 6x compared to federated learning. (does it have higher local training requirements)

Question: Does this use more training data than FL approaches, would it be better to do FL with synthetically generated data.

Question: why is this happening. Answer: the LLAMA model that was producing the data that was being voted on had information that wasn’t contained in the model??

Qi Pang : Attacks on LLM watermarking schemes

Watermarking sentences generated by LLMs, secret pattern that tells the watermark detection model that the content was generated from an AI model. Robustness guarantees state that watermarked sentences can be edited to some degree and determine that the text was originally generated by an LLM.

One issue of robustness is that attackers can make edits that don’t disrupt the watermark but make the content offensive to spoof people into thinking that the model made the content.

Question (how much change from the user can the watermark detection model handle before it can’t tell that it is watermarked anymore).

Question: Are spoofing attacks really a big concern? Why would someone believe a random person saying that an LLM outputted bad content. It may be socially better for there to be a general belief that it isn’t possible to determine if something is fake. Does watermarking have negative impacts on things like alignment from LLMs? Are there any bounds.

Day 2: Wednesday, September 25, 2024

Alina Oprea (Distinguished Alumni Award honoree): Security and Privacy of Generative AI

Work on an overview of adversarial machine learning taxonomy and terminology.

Poisoning attacks, securing predictive AI pipelines, applying multiple securities to the AI lifecycle. Many advancements in the progress of security for non-generative AI systems.

New issues introduced of LLMs that cost 100 million or more to train. Two approaches for applying these models is fine-tuning. Retrieval augmented generation nvidia chatRTX gets top relevant documents from the knowledge base. This can be an issue with the introduction of poisoned documents, can make poisoning trigger phrase that produces hate speech in its reply. Phantom attack.

This could be a potential application of IBL since the retrieval is based on history this may impact the influence that a single document can have on the abilities of trigger phrases.

Other types of attacks attempt to extract training data from large language models by prompting them, general attacks that apply to any autoregressive model. Outputted personally identifiable information and other sensitive information.

This raises the question of what type of information is memorized and can be accessed through prompting, this includes named individuals and contact information, even content removed from the web that was present when GPT was trained.

User inference attacks determine if a specific user participated in fine tuning, meaning fine tuning attacks have specific risks. The risks associated required a method of empirically measuring privacy leakage of an ml algorithm and can be done using auditing differential private machine learning.

Open problems, lack of knowledge about diffusion models, big LLMs, and multi-modal models. Also few ways to mitigate risks. AI privacy auditing fails to scale to larger models. Can we desgn collaborative human-AI defenses to make cyber networks more secure?

Question: How do we deal with the issue that the model takes in data as input from users. Answer: one approach could be to have differential security around some of the training data vs. the user or fine-tuning data and maybe those could be treated differently through some methods.

Generative AI and ML

Swarun Kumar : Generative AI for Wireless Security

Wireless security and privacy in IoT situations. Problem statement: wireless configuration is unintuitive and difficult to ensure security. This is a bigger issue for enterprise networks due to compliance and the size of networks. This leads to heterogeneous networks with multiple complex configurations.

Solution is a conversational tool for wireless configuration that can hopefully produce wireless configuration documents. Major challenges with this tool are hallucinations and trustworthiness.

One way to improve trustworthiness is to develop embeddings that can be used to test the validity of wireless configuration documents. Asking GPT-4 to generate this without any finetuning produces some code that looks reasonable and doesn’t hallucinate too much.

Challenges for GPT in network configuration, can’t do specific things like an ASUS vs NETGEAR router. Trying to improve with just prompt engineering didn’t seem to work well. Next step was to fine tune the model. Searching for the vector database that finds the closest embedding to the one being output and find documents in the knowledge base.

Question: This looks like what Alina was discussing on nvidia RTX which is susceptible to data poisoning attacks, is there any way that this could be prevented in your model?

Kathleen Carley : GenAI and Scenario Creation and Synthetic Data for Training and Planning

Measuring the impacts of online attacks that employ LLMs online to sway public opinion and information influence campaigns. Lots of tools can be used to determine who is conducting these attacks, machine learning, listings of people that may be doing it, and network analysis to determine where it is coming from.

Big concerns with the credibility, linguistic style, of the synthetic data that is produced by these methods. So training on this synthetic data is problematic.

MOMUS platform tries to assess the quality of synthetic data and determines how reliable it is. Used in a training platform that was producing live generated synthetic data. Can be used to create synthetic disinformation and real information that can be used to train people on dealing with these types of things.

Question: Could this be misused to generate better misinformation that is harder to detect? Also what are the ethics of enabling the government to sway public opinion on a issue like war.

Question: Is there any human subject research in this area, similar to David Rand’s work on the perception of content that comes from LLMs in terms of persuasiveness.

Mark Sherman : Advances in Using LLMs for Securing Software

Story of the CEO of BP saying that they can cut 70% of the task force because programming can be done with AI. Generally code written by AI has security vulnerabilities. SEI has been looking at code for a while and writing C/C++ CERT coding standards for security compliance. This is a good document for something like finetuning since it lists non-compliant code with examples and then the compliant code with examples.

Chat-GPT3.5 assessed on its ability, and it is likely that this model was trained on the document describing the code. Question of the talk is how much the field is improving between 3.5 and 4/copilot. Some improvements but als ostill fails in some cases.

Guardrails also prevent the output of vulnerabilities, and asking about issues in code may lead to the model or service saying it can’t help because it thinks you are trying to do something nefarious. Sometimes it just outputs the related security vulnerability and how to do it.

48->75% accuracy improvement in finding issues with a few hundred issues. The GPT-4 model worked better than copilot even though copilot has domain specific information. Maybe bigger models are just always better.

Lei Li : Detecting Copyrighted Content in Language Model’s Training Data

Companies do not release their training data so it is hard to verify when they are using copy-written information. One of the big challenges is that exact copies of copy-written content can exist on many areas on the web, like Reddit and so on.

Intuition is that LLMs are likely to identify verbatim content from its training data, since it is encouraged to write that information.

DE-COP method uses multiple choice question-answering to choose which of the passages are correct in ending an example sentence. Large difference in the accuracy of predicting the correct phrase compared to a newer book that was written after training, then it might be a sign that the suspect was included in the training data.

Human behavior research shows that people who recently read the book are very bad at remembering the exact wording of the text.

Question: Why is selecting from options better than just giving them the passages themselves and seeing if the subsequent text matches?

Question: How do you assess the accuracy of this on black-box models if you don’t have the data that the models were trained on?

Lauren McIlvenny : Lessons learned 6 months after standing up an AI Security and Incident Response Team

GPU attacks can be important vectors to consider in terms of AI safety. A lot of the common practices of CPU security. Pickle files are very insecure and can be adjusted to include backdoor and other covert software. AISIRT builds on cybersecurity patforms like CSIRT, many of which vulnerabilities can be potentially negatively impacted by the introduction of AI.

Big issues are AI incident responses, cataloging AI incidents, AI vulnerability discovery, AI vulnerability management test and verify, Generative AI red teaming event, Describing best practices for AI vulnerability assessment.

Many techniques in standard DevSecOps can be tailored to AI development, don’t need to reinvent the wheel in these cases. But a lot of interesting work is left to be done regarding the ways in which current techniques don’t apply to ML/AI safety and security.

Crypto and Blockchain

Ariel Zetlin-Jones : Decentralized Liquidity

DeFi is based on trading assets without a centralized exchange. Most of the research on liquidity providers focuses on passive LPs that passively invest their assets to accumulate feed on exchanging their tokens. Other less well studied entities are Professional LPs that work differently.

Constant product market making curve defines the relative value of 2 assets. Passive LPs don’t set prices, but professional ones do. This can be done with swapping then minting behavior, and there is evidence from the ETH chain that this is being done.

Question: (what about bitcoin and others that are more computationally intensive to trade, do computational limitations make this type of task more or less difficult? Current average cost to trade BTC is 58 vs 0.0007 for ETH). How does the average cost to trade impact a sophisticated professional LP’s ability to perform swap-then-minting?

Question: How do you deal with the accusations from the public perception that the only use cases of crypto is essentially gambling on penny stocks, money laundering or paying for illegal things.

Bryan Routledge : Insights on Cryptocurrency Trading

Can trade coins with each other, futures, and for cash on decentralized trading platforms that still have a single private set of buy and sell orders.

Equities trading requires the disclosure of NBBO (national best bid and offer) from the SEC when you are trying to buy and sell but this does not exist for the crypto market. In crypto there are tiers that define the taker fee based on the amount that you exchange on a service in a given time.

Osman Yagan : Analysis and Optimization of Resilience in Blockchain P2P and Transaction Networks

Network analysis of crypto blockchains. DDoS attacks are a concern for the availability of networks after being randomly or targeted attacks. In the specific case of ripple, there is only a linear decrease in the size of the largest connected component and the number of nodes removed, this is optimal. However, when these nodes are carefully selected based on the more connected nodes, only a small percentage of nodes removed is required to disrupt the network fully.

Research question is how to improve the resilience of networks to robustness to targeted attacks. Random k-out graphs approach to adding edges. Adding 4 edges per node increases resiliency to robustness against targeted attacks.

Question: Why is the ratio of in and out edges to notes exactly 1 with a standard deviation of 0?

Sarah Scheffler : Verification and Privacy in Content Moderation

Centralization vs. decentralization in end-to-end encryption and content management in E2EE. How does content moderation work with E2EE? One approach is a match list and a private matching protocol that tests content without knowing what it is specifically and deletes it.

How do we balance the desire for privacy with the need for content management in some cases that are significantly problematic. The desire is to allow the public to verify platform behavior, ensures all types of content can be private. This tension is due to the lack of the ability of the users to trust that the platform is only removing content that it should?

Need to be able to assert and prove that a service is moderating a specific type of content… zero knowledge proofs of membership and non-membership.

Question: How does this apply to government equivalence and legal requirements to remove specific types of content, maybe in cases where the government is not public about the specific content that it wants moderated.

System and Hardware Security

Riccardo Paccagnella : Timing Attacks on Constant-Time Code

michroarchitectural timing attacks can be mitigated using constant-time programming which relies on avoiding secret dependent code. Code that appears constant time can actually not be because of data memory dependent prefetchers which additionally access memory when they get requests for information that appears to be a memory address. anytime you do something like data = *addr. Chosen ciphertext attacks can be done using this issue by causing an intermediate state of some operation to be a valid pointer. A result of this was apple updating their documentation and disabled DMP by default in all crypto libraries used by Go.

Haozhe Zhou : Bring Privacy To The Table: Interactive Negotiation for Privacy Settings of Shared Sensing Devices

By-standards have an unawareness of device and data collection as well as a lack of a means for expressing their preferences. Could be disagreement in groups about what their preferences for privacy within shared spaces are, making it difficult to negotiate. This can be addressed with ThinPoll that allows for non verbal communication of preferences without the same social issues associated with verbalizing these preferences.

Human behavior study that used 30 people to test the device and compared to other methods of polling like home owner choice, majority vote, etc. Works by querying people about specific preferences for specific reasons so they can voice their opinions within a context.

Higher agreement reached quickly, among participants and reasonable respect of other’s preferences was achieved.

Question: could these ratings be done before going to social situations so that it is handled automatically and only notifies you if something is against your preferences.

Yorie Nakahira : Ensuring Safety and Resilience of Autonomous System

Assuring safety in the context of other agents who may be changing their behavior in reaction to yours, or a social situation that may be very complex. Part of safety assurance is quantifying the uncertainty in neural networks. Sampling based techniques for determining uncertainty is expensive and takes a while. It can be determined analytically using tools. It is also difficult to estimate risk in the future in highly complex environments. This can be addressed by physics informed learning which can predict future risks in a manner that is grounded by physics.

How can multiple agents that only have local information guarantee global safety across a large number of agents. Predicting human behavior in self driving to avoid hitting people who may be trying to cross the street. Developed an LLM based reasoning policy context aware llm based safe control against latent risk. Difficult to ensure that the LLM will always give safe information. More safety can be achieved by iteratively querying LLM based on safety concerns.